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Introduction 

Prostate  c  ancer  a  ccounts  f  or  one  -third  of  none  utaneous  c  ancers  di  agnosed  in  USmen,1  is  a 
leading  c  ause  o  f  can  cer-related  de  ath  a  nd  i  s,  a  ppropriately,  t  he  s  ubject  of  he  ightened  publ  ic 
awareness  and  widespread  screening.  If  prostate-specific  antigen  (PSA)2  or  digital  rectal  screens 
are  abnormal,3  a  biopsy  is  considered  to  detect  or  rule  out  cancer.  Pathologic  status  of  biopsied 
tissue  forms  the  definitive  diagnosis  for  prostate  cancer  and  constitutes  an  important  cornerstone 
of  therapy  and  prognosis.4  There  is,  hence,  a  need  to  add  useful  information  to  diagnoses  and  to 
introduce  n  ew  te  chnologies  th  at  a  llow  e  fficient  analyses  o  f  c  ancer  to  focus  limite  d  h  ealthcare 
resources.  F  or  t  he  r  easons  unde  rlined  a  hove,  t  here  i  s  a  n  u  rgent  ne  ed  f  or  hi  gh-throughput, 
automated  and  obj  ective  pathology  tools.  Our  general  hypothesis  is  that  these  requirements  are 
satisfied  through  innovative  spectroscopic  imaging  approaches  that  are  compatible  with,  and  add 
substantially  t  o,  c  urrent  pa  thology  pr  actice.  Hence,  the  o  verall  a  im  of  th  is  p  roject  is  to 
demonstrate  t  he  ut  ility  of  nove  1 F  ourier  t  ransform  i  nfrared  (  FTIR)  s  pectroscopy-based, 
computer-aided  diagnoses  for  prostate  cancer  and  develop  the  required  microscopy  and  software 
tools  to  enable  its  application. 

FTIR  spectroscopic  imaging  is  a  new  technique  that  combines  the  spatial  specificity  of  optical 
microscopy  an  d  t  he  b  iochemical  co  ntent  o  f  s  pectroscopy.5  As  oppos  ed  t  o  t  hermal  i  nfrared 
imaging,  FTIR  i  maging  m  easures  t  he  a  bsorption  pr  operties  of  t  issue  t  hrough  a  s  pectrum 
consisting  of  (typically)  1024  to  2048  w  avelength  elements  per  pixel.6  Since  mid-IR  (2-12  pm 
wavelength)  spectra  reflect  the  molecular  composition  of  the  tissue,  image  contrast  arises  from 
differences  in  endogenous  chemical  species.  As  opposed  to  visible  microscopy  of  stained  tissue 
that  r  equires  a  h  uman  eye  tod  etect  ch  anges,  n  umerical  c  omputation  i  s  r  equired  toe  xtract 
information  from  IR  s  pectra  of  uns  tained  t  issue.  E  xtracted  i  nformation,  ba  sed  on  a  c  omputer 
algorithm,  i  s  i  nherently  obj  ective  a  nd  automated.  R  ecent  w  ork  ha  s  d  emonstrated  t  hat  t  hese 
determinations  ar  e  al  so  accu  rate  an  d  reproducible  i  n  1  arge  pa  tient  po  pulations.7  Hence,  w  e 
focused,  in  the  first  year  of  this  project,  on  de  monstrating  that  the  laboratory  results  could  be 
optimized  using  novel  approaches  to  fast  imaging.  This  is  a  critical  step,  since  we  propose  next 
to  analyze  375  radical  prostatectomy  samples.  We  have  been  able  to  optimize  data  acquisition 
parameters  and  develop  a  novel  algorithm  for  processing  data  that  enables  almost  50-fold  faster 
imaging.  B  riefly,  th  e  id  ea  b  ehind  th  e  p  rocess  is  illu  strated  in  Figure  1 .  In  t  his  pe  rformance 
period,  we  sought  to  use  acquired  data  to  establish  the  use  of  IR  imaging  for  validating  cancer 
diagnosis  (task  2),  develop  a  calibration  and  prediction  model  for  grading  and  perform  extensive 
validation  (task  2).  Finally,  we  sought  to  develop  a  m  athematical  framework  to  relate  disparate 
pieces  of  information  to  outcome  (task  3). 
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Figure  1.  (A)  Conventional  imaging  in  pathology  requires  dyes  and  a  human  to  recognize 
cells.  In  chemical  imaging  data  cubes  (B),  both  a  spectrum  at  any  pixel  (C)  and  the  spatial 
distribution  of  any  spectral  feature  can  be  seen.  e.g.  in  (D)  nucleic  acids  (left,  at  ~1080  cm' 
J),  and  collagen  specific  (right,  at  ~  1245  cm'1 )  Computational  tools  can  then  convert 
chemical  imaging  data  to  knowledge  used  in  pathology  (E). 


Body 

Specific  activities  and  tasks  as  per  statement  of  work  during  this  performance  period  are 
described  below.  Details  of  performance  for  the  past  years  periods  are  given  in  the  past  annual 
reports  which  is  attached  for  quick  reference  of  the  reviewers.  : 

Task  L  Perform  infrared  spectroscopic  imaging  on  prostate  biopsy  specimens 

All  activities  for  this  task  were  completed  in  year  1  and  2 

Task  2.  Analyze  spectroscopic  imaging  data  for  biochemical  markers  of  tumor  and  develop 

numerical  algorithms  for  grading  cancer 

Goal:  Develop  algorithm  for  malignancy  recognition.  Models  will  be  constructed  and  optimized 
using  Genetic  Algorithms  operating  on  identified  metrics.  Models  will  be  tested  and  validated 
using  ROC  curves  with  pathologist  marking  as  the  ground  truth.  A  protocol  for  segmenting 
benign  from  atypical  condition  will  be  available.  (Months  11-18)  Three  specific  aims  from  the 
statement  of  work  (SOW)  are: 

a.  Identify  samples  to  be  imaged  (Months  1-3)  by  examining  stained  slides 

b.  Obtain  unstained  samples  to  be  imaged  and  define  regions  for  calibration  and 
validation  (Months  4-7) 

c.  Perform  histologic  identification  on  prostate  samples  and  validate 

d.  Reduce  spectral  metrics  to  those  useful  in  identifying  atypia  (Months  8-12) 

e.  Develop  protocols  and  validate  distinction  between  benign-appearing  and  atypical 
tissue  (Months  12-18) 

f.  Develop  calibration  for  predicting  cancer  grade  (Months  1 8-22) 

g.  Develop  protocols  and  validate  Gleason  grading  of  tumor  (Months  18-27) 


Activities:  Task  2a-2d  were  accomplished  in  years  1  and  2. 


TASK  2E:  DEVELOP  PROTOCOLS  AND  VALIDATE  DISTINCTION  BETWEEN 
BENIGN-APPEARING  AND  ATYPICAL  TISSUE 

We  were  able  to  accomplish  task  2e  entirely  and  a  manuscript  has  been  submitted  (under 
review).  An  invention  disclosure  was  fded  with  the  office  and  technology  management,  who 
then  decided  to  file  a  preliminary  paten  on  the  work. 

We  develop  a  new  fully-automated  method  to  classify  cancer  versus  non -cancer  prostate  tissue 
samples.  T  he  c  lassification  a  lgorithm  u  ses  mo  rphological  f  eatures  -  geometric  pr  operties  of 
epithelial  cells/nuclei  and  lumens  -  that  are  quantified  based  on  H&E  stained  images  as  well  as 
FT-IR  images  of  the  samples.  By  restricting  the  features  used  to  geometric  measures,  we  sought 
to  m  imic  t  he  pa  ttem  r  ecognition  pr  ocess  e  mployed  b  y  hum  an  experts,  a  nd  a  chieve  a  r  obust 
classification  p  rocedure  that  can  p  roduce  co  nsistently  h  igh  a  ccuracy  a  cross  i  ndependent  da  ta 
sets.  Wes  ystematically  evaluate  t he  p  erformance  o  f  t he  n  ew  m  ethod  t hrough  c  ross-validation, 
and  ex  amine  i  ts  r  obustness  acr  oss  d  ata  s  ets.  W  e  al  so  s  ummarize  t  he  s  pecific  m  orphological 
features  that  prove  to  be  most  informative  in  classification. 
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igure  2.  IR  imaging  data  and  its  use  in  histologic  classification. 


(Upper  row)  IR  imaging  data  (b)  is  acquired  for  an  unstained  tissue  section  (a).  The 
data  is  then  classified  into  cell  types  and  a  classified  image  (c)  is  obtained.  The  colors 
indicate  cell  types  in  a  histologic  model  of  prostate  tissue.  This  method  is  robust  and 
applied  to  hundreds  of  tissue  samples  using  the  tissue  microarray  (TMA)  format. 
(Lower  row)  H&E  (d)  and  IR  classified  (e)  images  of  a  part  of  the  TMAs  used. 

Methods:  Several  new  methods  were  developed  to  accomplish  the  task. 


We  begin  with  a  description  of  the  computational  pipeline.  As  noted  above,  a  key  aspect  of  our 
approach  is  the  use  of  FT-IR  imaging  data  on  a  serial  section  that  is  H&E-stained  to  enhance  the 
segmentation  of  nuclei  and  lumens.  The  first  two  components  of  the  pipeline  (§1-2)  are  geared  to 
this  f  unctionality,  while  t  he  ne  xt  t  hree  c  omponents  (  §3-5)  ex  ploit  t  he  s  egmented  f  eatures 
obtained  from  image  data  to  classify  the  tissue  sample  (Figure  3). 
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Figure  3.  Overview  of  the  approach. 

(a,  b)  FTIR  spectroscopic  imaging  data-based  cell-type  classification  (IR  classified 
image),  is  overlaid  with  H&E  stained  image  (a),  leading  to  segmentation  of  nuclei  and 
lumens  in  a  tissue  sample  (b).  (c,d,e)  Features  are  extracted  and  selected  (c),  and  used 
by  the  classifier  (d)  to  predict  (e)  whether  the  sample  is  cancerous  or  benign. 


1.  Image  Registration 


Given  two  images,  the  image  registration  problem  can  be  defined  as  finding  the  optimal  spatial 
and  intensity  transformation  of  one  image  to  the  other.  Here,  two  images  are  H&E  stained  and 
“IR  cl  assified”  i mages  which  were  acquired  from  adjacent  tissue  s  amples.  The  IR  classified 
image  r  epresents  t  he  FT-IR  i  maging  d  ata,  p  rocessed  as  i  ndicated  i  n  F  igure  2 ,  t  o  cl  assify  ea  eh 
pixel  as  a  particular  cell  type.  Although  the  two  samples  were  physically  in  the  same  intact  tissue 
and  are  structurally  similar,  the  two  images  have  different  properties  (total  image  and  pixel  sizes, 
contrast  mechanisms  and  data  values).  Hence,  features  to  spatially  register  the  images  are  not 
trivial.  T  he  H  &E  i  mage  pr  ovides  de  tailed  m  orphological  i  nformation  t  hat  c  ould  or  dinarily  be 
used  for  registration,  but  the  IR  image  lacks  such  information.  On  the  other  hand,  the  IR  image 
specifies  the  exact  areas  corresponding  to  each  cell  type,  but  the  difficulty  in  precisely  extracting 
such  regions  from  the  H&E  image  hinders  us  from  using  cell-type  information  for  registration. 
The  only  obvious  features  are  macroscopic  sample  shape  and  empty  space  (lumens)  inside  the 
samples.  T  o  ut  ilize  t  hese  t  wo  f  eatures  a  nd  t  o  a  void  pr  oblems  due  tod  ifferences  i  n  t  he  t  wo 
imaging  techniques,  both  images  are  first  converted  into  binary  images.  Due  to  the  binarization, 
the  i  ntensity  t  ransformation  i  s  not  ne  cessary.  As  a  s  patial  t  ransformation,  w  e  u  se  an  af  fine 
transformation  (  / )  where  a  co  ordinate  (xy,  y/)  is  t ransformed  t o  the  (x2,  y2)  co ordinate  after 
translations  (tx,  ty),  rotation  by  #,  and  scaling  by  factor  5. 
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Accordingly,  w  e  find  the  o  ptimal  p  arameters  o  f  th  e  a  ffine  tr  ansformation  th  at  min  imizes  th  e 
absolute  i  ntensity  d  ifference  b  etween  t  wo  i  mages  ( Inference  and  Itarget )•  I  n  ot  her  w  ords,  i  mage 
registration  a  mounts  t  of  inding  t  he  opt  imal  pa  rameter  va  lues 


(t*  ,t*  ,0* ,  s')  =  argmin  Ireference  -  f  Qtarget ; tx , ty , #, s  1 .  The  downhill  simplex  method  is  applied  to 

tr,tv,6,s 


solve  the  above  equation.  An  example  of  this  registration  process  is  shown  in  Figure  4. 
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Figure  4.  Image  Registration. 


H&E  stained  images  and  IR  classified  images  are  first  converted  into  binary  images. 

The  IR  classified  image  is  overlaid  with  the  H&E  stained  image  by  affine 
transformation,  with  the  optimal  matching  being  found  by  minimizing  the  absolute 
intensity  difference  between  two  images.  After  registration,  original  annotations 
(color  and/or  cell-type  information)  of  each  image  are  restored 

2.  Identification  of  epithelial  cells  and  their  morphologic  features 

While  a  num  ber  of  factors  a  re  kno  wn  t  o  b  e  t  ransformed  i  n  cancerous  t  issues,  e  pithelial 
morphology  is  utilized  as  the  clinical  gold  standard.  Hence,  we  focus  here  on  cellular  and  nuclear 
morphology  o  f  epithelial  nuc  lei  a  nd  1  umens.  These  s  tructures  a  re  di  fferent  i  n  no  rmal  a  nd 
cancerous  tissues,  but  are  not  widely  used  in  automated  analysis  due  to  a  few  reasons.  First,  as 
described  above,  simple  detection  of  epithelium  from  H&E  images  is  difficult.  Second,  detection 
of  epithelial  nuclei  may  be  confounded  by  a  stromal  response  that  is  not  uniform  for  all  grades 
and  t  ypes  of  c  ancers.  W  e  f  ocused  f  irst  on  a  ddressing  t  hese  t  wo  c  hallenges  t  hat  hi  nder 
automatically  parsing  morphologic  features  such  as  the  size  and  number  of  epithelial  nuclei  and 
lumens,  distance  from  nuclei  to  lumens,  geometry  of  the  nuclei  and  lumens,  and  others  (§3).  In 
order  to  use  these  properties,  the  first  step  is  to  detect  nuclei  and  lumens  correctly  and  we  sought 
to  develop  a  robust  strategy  for  the  same. 

2.1.  Lumen  Detection 

In  H  &E  s  tained  i  mages,  1  umens  a  re  r  ecognized  t  o  be  e  mpty  w  hite  spaces  s  unrounded  b  y 
epithelial  cells.  In  normal  tissues,  lumens  are  larger  in  diameter  and  can  have  a  variety  of  shapes. 
In  cancerous  tissues,  lumens  are  progressively  smaller  with  increasing  grade  and  generally  have 
less  distorted  elliptical  or  circular  shapes.  Our  strategy  to  detect  lumens  was  to  find  empty  areas 
that  are  located  next  to  the  areas  rich  in  epithelium.  White  spots  inside  the  sample  can  be  found 
from  the  H&E  image,  and  the  pixels  corresponding  to  epithelial  cells  can  be  mapped  on  the  H&E 
image  from  the  IR  classified  image  through  image  registration.  We  note  that  while  lumens  are 
ideally  completely  surrounded  by  epithelial  cells  (called  complete  lumens),  some  samples  have 
lumens  ( called  in  complete  lu  mens)  th  at  v  iolate  th  is  c  riterion  b  ecause  o  nly  a  p  art  o  f  lu  men  is 
present  in  the  sample.  To  identify  these  incomplete  lumens,  we  use  heuristic  criteria  based  on  the 
size,  shape,  presence  of  epithelial  cells  and  background  around  the  areas,  and  distance  from  the 
center  of  the  tissue.  (See  Supplementary  Materials  for  details.) 

2.2.  Nucleus  Detection  -  single  epithelial  cells 

Epithelial  nucleus  detection  by  automated  analysis  is  more  difficult  than  lumen  detection  due  to 
variability  in  s  taining  a  nd  e  xperimental  c  onditions  u  nder  w  hich  th  e  e  ntire  s  et  o  f  H  &E  ima  ges 
were  acquired.  Differences  between  normal  and  cancerous  tissues,  and  among  different  grades  of 
cancerous  tissues,  also  hamper  facile  detection.  To  handle  such  variations  and  make  the  contrast 
of  t  he  i  mages  consistent,  w  e  pe  rform  s  moothing  and  a  daptive  hi  stogram  e  qualization  pr  ior  t  o 
nuclei  identification.  Nuclei  are  relatively  dark  and  can  be  modeled  as  small  elliptical  areas  in  the 
stained  images.  This  geometrical  model  is  often  confounded  as  multiple  nuclei  can  be  so  close  as 
to  ap  pear  1  ike  o  ne  1  arge,  arbitrary-shaped  nuc  leus.  A  Iso,  s  mall  folds  or  e  dge  s  taining  around 
lumens  can  make  the  darker  shaded  regions  difficult  to  analyze.  Here,  we  exploit  the  information 
provided  by  the  IR  classified  image  to  limit  ourselves  to  epithelial  cells,  and  use  a  thresholding 
heuristic  on  a  c  olor  s  pace-transformed  ima  ge  to  i  dentify  nuc  lei  w  ith  hi  gh  a  ccuracy.  Epithelial 
pixels  that  are  identified  on  the  H&E  images  using  the  IR  overlay  provide  pixels  of  dominated  by 


one  of  t  wo  c  olors:  bl  ue  or  pi  nk,  w  hich  a  rise  from  t  he  nuc  lear  and  c  ytoplasmic  c  omponent 
respectively.  For  nuclei  restricted  to  epithelial  cells  in  this  manner,  a  set  of  general  observations 
were  made  that  led  us  to  convert  the  stained  image  to  a  new  color  space  “RG-B”  (|R  +  G  -  B|). 
(R,  G ,  a  nd  B  r  epresent  t  he  i  ntensity  of  R  ed,  Green,  a  nd  Blue  c  hannels,  r  espectively.)  T  his 
transformation,  followed  by  suitable  thresholding,  was  able  to  successfully  characterize  the  areas 
where  n  uclei  are  p  resent.  T  he  t  hreshold  v  alues  are  adaptively  d  etermined  f  or  R  ed  an  d  G  reen 
channels  due  to  the  variations  in  the  color  intensity.  (See  Supplementary  Materials  for  details.) 
Finally,  filling  hoi  es  a  nd  g  aps  w  ithin  nuc  lei  b  y  a  m  orphological  c  losing  ope  ration,  t  he 
segmentation  o  f  each  n  ucleus  i  s  a  ccomplished  by  u  sing  a  w  atershed  a  lgorithm  f  ollowed  b  y 
elimination  of  false  detections.  The  size,  shape,  and  average  intensity  are  considered  to  identify 
and  remove  artifactual  nuclei.  Figure  5  details  the  nucleus  detection  procedure. 
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Figure  5.  Nucleus  Detection. 

Smoothing  and  adaptive  histogram  equalization  are  performed  to  alleviate  variability 
in  H&E  stained  image  and  to  obtain  better  contrast.  “RG  -  B”  conversion  followed  by 
thresholding  characterizes  the  areas  where  nuclei  exist.  Morphological  closing 
operation  is  performed  to  fill  holes  and  gaps  within  nuclei,  and  a  watershed  algorithm 
segments  each  individual  nuclei.  The  segmented  nuclei  are  constrained  by  their  shape, 
size,  and  average  intensity  and  epithelial  cell  classification  (green  pixels)  provided  by 
the  overlaid  IR  image. 

3.  Feature  Extraction 

As  mentioned  above,  the  characteristics  of  nuclei  and  lumens  change  in  cancerous  tissues.  In  a 
normal  t  issue,  e  pithelial  c  ells  a  re  1  ocated  m  ostly  i  n  t  hin  1  ayers  a  round  lumens.  In  c  ancerous 
tissue,  these  cells  generally  grow  to  fill  lumens,  resulting  in  a  decrease  in  the  size  of  lumens,  with 


the  shape  of  lumens  becoming  more  elliptical  or  circular.  The  epithelial  association  with  a  lumen 
becomes  i  nconsistent  a  nd  e  pithelial  f  oci  m  ay  adjoin  1  umens  or  m  ay  also  e  xist  w  ithout  a  n 
apparent  lumen.  Epithelial  cells  invading  the  extra-cellular  matrix  also  result  in  a  deviation  from 
the  well-formed  lumen  structure;  this  is  well-recognized  as  a  hallmark  of  cancer.  Due  to  filling 
lumen  s  pace  a  nd  i  nvasion  i  nto  t  he  e  xtra-cellular  s  pace,  t  he  n  umber  d  ensity  o  f  ep  ithelial  cel  Is 
increases  in  tissue.  The  size  of  individual  epithelial  cells  and  their  nuclei  also  tend  to  increase  as 
malignancy  of  a  t  umor  i  ncreases.  M  otivated  by  s  uch  r  ecognized  m  orphological  di  fferences 
between  normal  and  cancerous  tissues,  we  chose  to  use  epithelial  nuclei  and  lumens  as  the  basis 
of  the  several  quantitative  features  that  our  classification  system  works  with.  (See  examples  of 
such  features  in  Figure  6.)  It  is  notable  that  these  observations  are  qualitative  in  actual  clinical 
practice  and  have  not  been  previously  quantified. 
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Figure  6.  Examples  Features. 


Each  panel  shows  one  example  feature,  along  with  the  distributions  of  the  feature’s 
values  for  cancer  (red)  and  benign  (blue)  classes. 

3.1.  Epithelial  cell-related  features 

We  use  epithelial  cell  type  classification  from  IR  data  to  measure  epithelium-related  features. 
However,  individual  epithelial  cells  in  the  tissue  are  not  easily  delineated.  Therefore,  in  addition 
to  features  directly  describing  epithelial  cells,  we  also  quantify  properties  of  epithelial  nuclei, 
which  ar  e  av  ailable  f  rom  t  he  s  egmentation  de  scribed  i  n  §2.  T  he  qua  ntities  w  e  m  easure  i  n 
defining  features  are:  (1)  size  of  epithelial  cells,  (2)  size  of  epithelial  nuclei,  (3)  number  of  nuclei 
in  the  sample,  (4)  distance  from  a  nucleus  to  the  closest  lumen,  (5)  distance  from  a  nucleus  to  the 
epithelial  cell  boundary,  (6)  number  of  “isolated”  nuclei  (nuclei  that  have  no  neighboring  nucleus 
within  a  certain  distance),  (7)  number  of  nuclei  located  “far”  from  lumens,  and  ( 8)  entropy  of 
spatial  distribution  of  nuclei  (Figure  6G).  S upplementary  M aterials  provide  specifics  of  these 
measures  and  their  calculation. 


3.2.  Lumen-related  features 

Features  describing  glands  have  been  shown  to  be  effective  in  PCa  classification.  Here,  we  try  to 
characterize  1  umens  a  nd  m  ostly  f  ocus  on  t  he  differences  i  n  t  he  s  hape  of  t  he  1  umens.  T  he 
quantities  we  measure  in  defining  these  features  are:  (1)  size  of  a  lumen,  (2)  number  of  lumens, 


(3)  lumen  “roundness”,  defined  as 
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size  of  the  lumen,  and  r  is  the  radius  of  a  circle  of  size  Larea ,  (4)  lumen  “distortion”  (Figure  6A), 
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is  the  distance  from  the  center  of  a  lumen  to  the  boundary  of 


the  1  umen  a  nd  A  VG(-)  and  STD(-)  represent  t  he  a  verage  and  s  tandard  de  viation,  (  5)  1  umen 
“minimum  bounding  circle  ratio”  (Figure  6B),  defined  as  the  ratio  of  the  size  of  a  minimum 
bounding  circle  of  a  lumen  to  the  size  of  the  lumen,  (6)  lumen  “convex  hull  ratio”  (Figure  6C), 
which  is  the  ratio  of  the  size  of  a  convex  hull  of  a  lumen  to  the  size  of  the  lumen,  (7)  symmetric 
index  of  lumen  bounda  ry  (Figure  6  E,  s  ee  S  upplementary  M  aterials),  ( 8)  s  ymmetric  i  ndex  o  f 
lumen  area  (Figure  6F,  see  Supplementary  Materials),  and  (9)  spatial  association  of  lumens  and 
cytoplasm-rich  regions  (Figure  6D,  see  Supplementary  Materials).  Features  (3)  -  (8)  are  various 
ways  t  o  s  ummarize  1  umen  s  hapes,  w  hile  feature  (  9)  i  s  m  otivated  b  y  t  he  1  oss  of  f  unctional 
polarization  of  epithelial  cells  in  cancerous  tissues. 


3.3.  Global  &  local  tissue  features 

We  have  described  above  the  individual  measures  of  epithelium  and  lumen  related  quantities  that 
form  the  basis  of  the  features  used  by  our  classification  system.  Normally,  these  features  have  to 
be  s  ummary  m  easures  o  ver  t  he  entire  t  issue  s  ample  o  r  d  esired  classification  ar  ea.  H  ence,  w  e 
employ  average  (AVG)  or  standard  deviation  (STD),  and  in  some  cases  the  sum  total  (TOT)  of 
these  quantities  for  further  analysis.  These  features  are  called  “global”  features  since  they  are 
calculated  from  the  entire  sample.  However,  in  some  cases  global  features  may  be  misleading, 
especially  where  only  a  part  of  the  tissue  sample  is  indicative  of  cancer.  Therefore,  in  addition  to 
global  f  eatures,  wed  efine  “1  ocal”  f  eatures  b  y  s  liding  a  r  ectangular  w  indow  o  f  a  f  ixed  size 
(typically  100x100  pixels)  throughout  a  tissue  sample,  computing  the  average  or  sum  total  of  the 
feature  in  each  window,  and  computing  the  standard  deviation  and/or  extrema  over  the  values  for 


all  windows  (Figure  7).  In  all,  67  features  (29  global  and  38  local  features)  are  defined  capturing 
various  aspects  of  tissue  morphology. 

4.  Feature  Selection 

Feature  selection  is  the  step  where  the  classifier  examines  all  available  features  (67  in  our  case) 
with  respect  to  the  training  samples,  and  selects  a  subset  to  use  on  test  data.  This  selection  is 
generally  b  ased  o  n  t  he  criterion  o  f  h  igh  accuracy  on  t  raining  da  ta,  but  a  Iso  s  trives  toe  nsure 
generalizability  beyond  the  training  data.  We  adopt  a  two-stage  feature  selection  approach  here. 
In  t  he  f  irst  s  tage,  w  e  generate  as  et  o  f  can  didate  f  eatures  ( Ccandidate )  by  using  t  he  s  o-called 
minimum-redundancy-maximal-relevance  (mRMR)  criterion.  In  each  iteration,  given  a  feature 
set  chosen  thus  far,  mRMR  chooses  the  single  additional  feature  that  is  least  redundant  with  the 
chosen  features,  while  being  highly  correlated  with  the  class  label.  C candidate  is  a  set  of  features 
that  is  e  xpected  to  b  e  close  to  th  e  o  ptimal  f  eature  s  et  f  or  a  d  ataset  and  a  c  lassifier  u  nder 
consideration.  It  is  constructed  as  follows.  Given  a  feature  set  F  =  (fi,  ...,  fM)  ordered  by  mRMR, 
AUC  of  the  set  of  i  top-ranked  features  is  computed  for  varying  values  of  i.  We  limit  the  value  of 
i  to  be  <  30.  The  feature  subset  with  the  best  AUC  is  chosen  as  the  Candidate ■  In  the  second  stage, 
feature  s  election  co  ntinues  w  ith  Ccandidate  as  th  e  s  tarting  p  oint,  us  ing  t  he  s  equential  floating 
forward  s  election  (  SFFS)  m  ethod.  T  his  m  ethod  s  equentially  a  dds  n  ew  f  eatures  followed  b  y 
conditional  deletion(s)  of  already  selected  features.  Starting  with  the  Ccandidate,  SFFS  searches  for 
a  feature  x  0  Ccandidate  that  maximizes  the  AUC  among  all  feature  sets  Ccandidate  u  {x} ,  and  adds  it 
to  Ccandidate ■  Then,  it  finds  a  feature  x  e  Candidate  that  maximizes  the  AUC  among  all  feature  sets 
Ccandidate  -  {*}.  If  the  removal  of  x  improves  the  highest  AUC  obtained  by  Candidate,  x  is  deleted 
from  Ccandidate ■  A  s  1  ong  as  t  his  r  emoval  i  mproves  upon  t  he  hi  ghest  A  UC  obt  ained  s  o  f  ar,  t  he 
removal  step  is  repeated.  SFFS  repeats  the  addition  and  removal  steps  until  AUC  reaches  1 .0  or 
the  number  of  additions  and  deletions  exceeds  20,  and  the  feature  set  with  the  highest  AUC  thus 
far  is  chosen  as  the  optimal  feature  set.  The  classification  capability  of  a  feature  set,  required  for 
feature  s  election,  i  s  m  easured  b  y  t  he  ar  ea  u  nder  t  he  R  OC  cu  rve  ( AUC),  o  btained  b  y  cr  oss- 
validation  on  the  training  set. 

5.  Classification 

We  note  that  there  are  two  levels  o f  cl assification  here.  In  the  first,  IR  spectral  data  is  used  to 
provide  histologic  images  where  each  pixel  has  been  classified  as  a  cell  type.  In  the  second,  the 
measures  from  H&E  images  and  IR  images  are  used  to  classify  tissue  into  disease  states.  In  this 
manuscript,  we  do  not  discuss  the  first  classification  task  as  its  development  and  results  are  well- 
documented.  F  or  t  he  1  atter  t  ask,  w  e  u  sed  a  w  ell  es  tablished  cl  assification  al  gorithm,  n  amely 
support  ve ctor  m achine  (SVM).  Two  cost  factors  are  introduced  to  deal  with  an  imbalance  in 
training  data.  The  ratio  between  two  cost  functions  was  chosen  as 

C+  _  number  of  negative  training  examples 
C_  number  of  positive  training  examples 

to  make  t  he  p  otential  t  otal  co  st  o  f  t  he  f  alse  p  ositives  an  d  t  he  f  alse  n  egatives  t  he  s  ame.  ( See 
Supplementary  Materials  for  details.) 

6.  Data  preparation 

All  of  t  he  H  &E  s  tained  i  mages  were  a  cquired  on  a  s  tandard  opt  ical  m  icroscope  a  1 40x 
magnification.  The  size  of  each  pixel  is  0.9636  um  x  0.9636  um.  On  the  other  hand,  the  pixel 
size  of  I R  i  mages  i  s  6.2  5um  x  6.25um.  T  he  a  cquisition  w  as  pr  eviously  de  scribed  i  n  pr  evious 


years’ reports.  Two  data  sets,  stained  under  different  conditions,  were  used  in  this  study.  The 
first  dataset  (“Datal”)  consists  of  66  benign  samples  and  115  cancer  samples,  and  the  second  set 
(“Data2”)  includes  14  benign  and  36  cancer  samples.  These  were  previously  acquired  under  the 
grant. 

Results  and  discussion:  We  then  applied  the  methods  to  classify  prostate  tissue  and  the  results 
are  presented  below. 

1.  The  classification  system  achieves  AUC  greater  than  0.97  on  both  data  sets 

We  fi  rst  p  erformed  if- fold  cr  oss  v  alidation  o  n  each  d  ataset.  T  he  d  ata  s  et  w  as  d  ivided  i  nto  K 
roughly  e  qual-sized  p  artitions,  o  ne  p  artition  w  as  le  ft  o  ut  a  s  the  “  test  d  ata”,  th  e  c  lassifier  w  as 
trained  on  the  union  of  the  remaining  K  -  1  partitions  (the  “training  data”)  and  evaluated  on  the 
test  data.  This  was  repeated  K  times,  with  different  choices  of  the  left-out  partition.  (We  set  K  = 
10.)  In  each  repetition,  cross-validation  on  t  he  training  data  was  used  to  select  the  feature  set 
with  the  highest  AUC  as  explained  in  §4.  The  correct  and  incorrect  predictions  in  the  test  data, 
across  all  K  repetitions,  were  summarized  into  a  ROC  plot  and  the  AUC  was  computed,  along 
with  s  pecificities  when  sensitivity  e  quals  90,  95  ,  or  99  %.  S  ince  t  he  cross-validation  ex  ercise 
makes  random  choices  in  partitioning  the  data  set,  we  examined  averages  of  these  performance 
metrics  over  10  repeats  of  the  entire  cross  validation  pipeline.  The  average  AUC  for  Datal  and 
Data2  were  0.982  and  0.974  respectively  (Table  1,  “feature  extraction”  =  “IR  &  HE”).  At  90%, 
95%,  and  99%  sensitivities,  the  average  specificity  achieved  on  Datal  was  94.76%,  90.91%,  and 
77.80%  respectively,  while  that  on  Data2  was  92.53%,  84.19%,  and  49.54%  respectively. 

One  way  to  interpret  the  above  values  is  to  examine  our  automated  pipeline  as  a  pre-screening 
mechanism  to  identify  the  samples  to  be  examined  by  a  human  pathologist.  At  a  “true  positive 
rate”  of  99%  (which  means  that  only  1%  of  the  cancer  samples  will  be  missed  by  the  screen),  the 
“false  positive  rate”  is  22.2%  (i.e.,  22.2%  of  the  benign  samples  will  make  it  through  the  screen) 
on  average  for  Datal  (Tablel),  thereby  reducing  the  workload  of  the  pathologist  by  4.5  -fold. 
While  the  error  rate  of  manual  p athology  d eterminations  is  generally  accepted  to  be  in  1  -5% 
range,  i  nclusion  of  c  onfounding  c  ancer  m  imickers  r  aises  t  he  r  ate  t  o  a  s  hi  gh  a  s  7.5%  .  Also 
noteworthy  is  the  observation  that  the  same  algorithm  performs  consistently  well  on  hot  h  data 
sets,  that  were  obtained  from  different  staining  conditions.  This  speaks  to  the  robustness  of  the 
classification  framework,  an  attribute  that  we  investigated  further  in  the  next  exercise. 

2.  Classification  system  is  robust  to  staining  conditions 

Here,  we  trained  a  cl  assifier  on  Datal  and  tested  its  performance  on  Data2.  W e  obs  erved  an 
average  AUC  of  0.956,  with  average  specificity  of  88.57%,  81.92%,  and  26.86%  at  sensitivity 
equaling  90%,  95%,  and  99%  respectively  (Table  2,  “feature  extraction”  =  “IR  &  HE”).  These 
values  are  competitive  with  the  cross-validation  results  on  Data2  (Table  1 ),  where  the  training 
and  testing  were  both  performed  on  (disjoint  parts  of)  Data2. 

3.  IR  data  is  critical  to  classification  performance 

To  a  ssess  th  e  u  tility  o  f  the  IR-based  c  ell-type  c  lassification,  w  e  r  epeated  t  he  ab  ove  ex  ercises 
after  extracting  features  without  the  guidance  of  the  IR  data;  i.e.,  epithelial  cells  were  predicted 
from  t  he  H&E  i  mages  alone  (  see  Supplementary  M  aterials  f  or  d  etails).  A 11  o  f  t  he  f  eatures 


defined  in  §3  were  used,  except  for  “Spatial  association  of  lumens  and  apical  regions”,  since  the 
distinction  b  etween  c  ytoplasm-rich  an  d  n  uclear-rich  r  egion  i  n  epithelial  cel  Is  w  as  u  nclear  i  n 
H&E  i  mages.  T  he  results  f  rom  t  his  di  sadvantaged  c  lassifier  are  s  hown  i  n  T  ables  1  and  2 
(“feature  extraction”  =  “  HE  only”).  For  both  types  of  experiments,  we  obtained  lower  average 
AUCs  an  d  s  pecificity  v  alues.  F  or  i  nstance,  t  he  A  UC  of  c  ross-validation  in  Data2  (Table  1 ) 
dropped  from  0.974  to  0.880.  Similarly,  the  results  of  validation  between  datasets  (Table  2)  were 
substantially  worse  now  compared  to  the  IR-guided  classification,  with  the  AUC  dropping  from 
0.956 1  o  0.918.  T  his  i  ndicates  th  at  feature  extraction  w  ith  th  e  h  elp  o  f  th  e  IR  c  ell-type 
classification  is  c  ritical  to  c  onsistent  a  nd  r  eliable  c  lassification  o  f  c  ancer  v  ersus  b  enign  tis  sue 
samples.  _ _ _ _ _ 


Dataset 

Feature 

Extraction 

A1 

JC 

Sensitivity 

(%) 

Specificity  (%) 

Mf 

AVG 

STD 

AVG 

STD 

Datal 

IR  &  HE 

0.982 

0.0030 

90 

94.76 

1.64 

13 

95 

90.91 

1.62 

99 

77.80 

5.52 

HE  only 

0.968 

0.0052 

90 

91.64 

2.26 

11 

95 

83.90 

1.91 

99 

53.43 

13.65 

Data2 

IR  &  HE 

0.974 

0.0145 

90 

92.53 

7.11 

7 

95 

84.19 

10.84 

99 

49.54 

22.51 

HE  only 

0.880 

0.0175 

90 

61.34 

10.31 

8 

95 

22.21 

10.06 

99 

11.21 

6.01 

Table  1 .  Classification  results  via  cross-validation. 

AVG  and  STD  denote  average  and  standard  deviation  across  ten  repeats  of  cross-valdiation.  Mf 
is  the  median  size  of  the  feature  set  obtained  by  feature  selection  from  training  data.  Column 
“Feature  Extraction”  indicates  if  features  were  obtained  using  H&E  as  well  as  IR  data,  or  with 
H&E  data  alone. 


Feature 

Extraction 

Dataset 

A1 

JC 

Sensitivity 

(%) 

Specificity  (%) 

Mf 

AVG 

STD 

AVG 

STD 

IR  &  HE 

Train 

0.994 

0.0006 

90 

98.30 

0.68 

13 

95 

96.58 

1.10 

99 

91.55 

2.55 

Test 

0.956 

0.0089 

90 

88.57 

5.96 

95 

81.92 

5.28 

99 

26.86 

15.50 

HE  only 

Train 

0.986 

0.0021 

90 

97.77 

0.97 

10 

95 

91.56 

2.49 

99 

79.29 

4.47 

Test 

0.918 

0.0100 

90 

65.51 

8.37 

95 

46.14 

7.53 

99 

13.29 

6.94 

Table  2.  Validation  between  datasets. 

A  classifier  is  trained  on  Datal  and  tested  on  Data2.  AVG  and  STD  denote  the  average  and 
standard  deviation.  Mf  is  the  median  size  of  the  optimal  feature  set.  Column  “Feature  Extraction” 
indicates  if  features  were  obtained  using  H&E  as  well  as  IR  data,  or  with  H&E  data  alone. 
Column  “Dataset”  indicates  if  the  performance  metrics  are  from  training  data  {Datal)  or  from 
test  data  {Data!). 

Previously,  T  abeshi  et  al.  a  chieved  ana  ccuracy  of  96.7%  vi  a  c  ross  va  lidation  i  n  c  ancer/no- 
cancer  c lassification.  Color,  morphometric,  and  texture  features  were  extracted,  and  all  images 
were  acquired  under  similar  conditions.  W e  note  that  our  classification  result  (Table  1),  based 
solely  on  morphology,  is  comparable  to  their  result;  however  the  software  developed  by  Tabeshi 
et  al.  was  not  available  for  evaluation  in  our  data  sets.  Color  and  texture  features  could  provide 
additional  information;  however,  their  robustness  to  different  data  sets  is  questionable,  and  their 
interpretation  i  s  not  a  s  obvious  a  s  t  hat  of  m  orphological  features,  which  ar  e  u  sed  i  n  cl  inical 
practice.  D  ifferent  d  ata  sets  m  ay  h  ave  v  aried  p  roperties  w  hich  m  ay  b  e  attributable  t  o  s  taining 
variations,  inconsistent  image  acquisition  settings,  and  image  preparation.  The  performance  of 
the  same  method  based  on  texture  features  has  been  seen  to  greatly  change  from  one  data  set  to 
another.  V  ariations  i  n  s  taining  m  ay  a  ffect  c  olor  f  eatures.  In  contrast,  m  orphological  f  eatures 
were  s  hown  t  o  b  e  r  obust  t  o  va  rying  i  mage  a  cquisition  s  ettings.  N  onetheless,  t  he  qu  ality  o  f 
morphological  features  is  subject  to  segmentation  of  histologic  objects.  Thus,  any  method  based 
on  morphological  features  will  benefit  from  the  IR  cell-type  classification. 


Number  of  Lumen  =  17 
Number  of  Nuclei  =  755 


Figure  7.  Global  and  Local  Feature  Extraction. 

Global  features  are  extracted  from  the  entire  tissue  sample,  and  local  features  are 
extracted  by  sliding  a  window  of  a  fixed  size  across  the  tissue  sample  and  computing 
summary  statistics,  such  as  standard  deviation,  of  window-specific  scores.  In  this 
example,  the  global  feature  “number  of  nuclei”  has  value  755,  while  one  example 
position  of  the  sliding  window  is  shown,  with  “number  of  nuclei”  =  29. 

4.  Examination  of  discriminative  features 

We  examined  the  importance  of  each  feature  by  its  rank  in  the  first  phase  of  feature  selection, 
based  o  n  i  ts  “r  elevance”  t  o  t  he  cl  ass  1  abel  (  see  S  upplementary  M  aterials,  m  RMR).  S  ince 


different  features  (e.g.,  average  or  standard  deviation,  global  or  local  features)  based  on  the  same 
underlying  quantity  (e.g.,  “lumen  roundness”)  generally  have  similar  relevance,  we  examined  the 
average  relevance  of  features  in  each  of  1 7  feature  categories  (Figure  8),  for  each  data  set.  The 
complete  lis  t  o  f  the  in  dividual  f  eatures  a  nd  t  heir  r  elevance  a  nd  m  RMR  r  ank  ( for  Datal)  i  s 
available  in  Figure  9.  For  Datal,  lumen-related  feature  categories  are  most  relevant  in  general, 
while  epithelium-related  feature  categories  are  most  important  for  Data2.  It  is  surprising  that  the 
top  3  feature  categories  in  Datal  (Figure  8,  blue  bars)  -  size  of  lumen,  lumen  roundness,  and 
lumen  convex  hull  ratio  -  have  very  low  relevance  in  Data2,  although  we  note  that  this  may  be 
in  large  part  due  to  variations  in  staining  and  malignancy  of  tumors  between  the  two  data  sets. 
Also,  examining  the  features  (or  feature  categories)  with  highest  relevance  alone  may  be  slightly 
misleading,  because  this  examination  does  not  account  for  redundancy  among  features. 
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Figure  8.  Importance  of  17  feature  categories. 

The  average  “maximal  relevance”  of  features  belonging  to  each  feature  category  is 
shown,  for  both  data  sets,  sorted  in  decreasing  order  for  the  first  data  set. 


Figure  9.  List  of  features  and  their  maximal  relevance  and  “mRMR  rank”. 

In  the  second  column,  G  and  L  represent  global  and  local  features,  respectively.  A  VG, 
STD,  TOT,  and  MAX  denote  the  average,  standard  deviation,  total  amount,  and 
extremal  value  of  features.  *  In  computing  local  features  representing  “size  of  lumen”, 
two  options  are  available:  one  is  to  consider  only  the  part  of  the  lumen  within  the 
window,  and  the  other  is  to  consider  the  entire  lumen  into  account.  Asterisk  indicates 
that  the  former  option  was  chosen. 


Figure  10.  Optimal  features  for  distinguishing  cancer  and  benign  tissue  samples. 

The  four  features  shown  here  are  always  present  in  the  optimal  feature  set  chosen  by 
the  classifier. 


Conclusions 

In  completing  this  task,  we  have  presented  a  means  to  eliminate  epithelium  recognition 
deficiencies  in  classifying  H&E  images  for  presence  or  absence  of  cancer.  The  method  is  entirely 
transparent  to  a  user  and  does  not  involve  any  adjustment  or  decision-making  based  on  spectral 
data.  We  were  able  to  achieve  very  effective  fusion  of  the  information  from  two  different 
modalities,  namely  optical  and  IR  microscopy,  that  provide  very  different  types  of  data  with 
different  characteristics.  Several  features  of  the  tissue  were  quantified  and  employed  for 
classification.  We  found  that  robust  classification  could  be  achieved  using  a  few  measures, 
which  are  detailed  to  arise  from  epithelial/lumen  organization  and  provide  a  reasonable 
explanation  for  the  accuracy  of  the  model.  The  choice  of  combining  the  IR  and  optical  data  is 
shown  to  be  necessary  for  achieving  the  high  accuracy  values  observed.  We  anticipate  that  the 
combined  use  of  the  two  microscopies  -  structural  and  chemical  -  will  lead  to  an  accurate,  robust 
and  automated  method  for  determining  cancer  within  biopsy  specimens. 


TASK  2F:  DEVELOP  CALIBRATION  FOR  PREDICTING  CANCER  GRADE 
(MONTHS  18-22) 

Motivation: 

Quality  assurance  in  clinical  pathology  plays  a  critical  role  in  the  management  of  patients  with 
prostate  cancer  as  pathology  is  the  gold  standard  of  diagnosis  and  forms  a  cornerstone  of  patient 
therapy.  Methods  to  integrate  quality  development,  quality  maintenance,  and  quality 
improvement  to  ensure  accurate  and  consistent  test  results  are,  hence,  critical  to  cancer 


management  in  any  setting.  These  factors  have  a  direct  bearing  on  patient  outcomes,  financial 
aspects  of  disease  management  as  well  as  malpractice  concerns.  One  of  the  major  failings  in 
prostate  pathology  today  is  the  rate  of  missed  tumors  and  variability  in  grading.  It  is  well  known 
that  the  grading  of  prostate  tissues  suffers  from  intra-  and  inter-pathologist  variability.  In  the 
studies  of  intra-  and  inter-pathologist  reproducibility,  the  exact  intra-pathologist  agreement  was 
achieved  in  43-78%  of  the  instances,  and  in  36-81%  of  the  instances,  the  exact  inter-pathologist 
agreement  was  reported.  It  is  also  known  that  the  variability  of  the  grading  could  be  reduced 
after  pathologists  are  re-trained.  There  could  be  many  ways  to  educate  pathologists  such  as 
meetings,  courses,  online  tutorials,  and  etc,  but  these  are  not  time-  and  cost-effective  for  routine 
everyday  decisions.  Therefore,  building  an  automated,  fast,  and  objective  method  to  aid 
pathologists  to  examine  prostate  tissues  will  greatly  help  to  attain  reliable  and  consistent 
diagnoses.  This  will  reduce  healthcare  costs  and  the  chances  of  malpractice  lawsuits  as  well  as 
improve  patient  outcomes  in  therapy. 

Innovation  in  our  approach  and  potential  benefits: 

When  a  pathologist  examines  tissue,  he/she  looks  at  a  stained  imaged  of  tissue  and  mentally 
compares  it  against  a  database  of  previous  knowledge  or  information  in  books.  In  essence,  the 
pathologist  is  manually  matching  structural  patterns  he/she  has  seen  earlier  and  mentally 
recalling  the  diagnosis  made  such  that  he/she  can  make  the  same  diagnosis  in  the  specific  test 
case.  Here,  we  report  developing  a  computer  information  and  management  and  decision-making 
system  that  relies  of  one  or  more  measures  of  the  structure  of  tissue  to  provide  images  from  a 
database  that  are  similar  to  the  sample  under  consideration.  We  emphasize  that  the  system  does 
not  provide  a  diagnosis  but  simply  provides  the  closest  matching  cases  that  enable  a  pathologist 
to  make  a  diagnosis.  We  also  propose  here  the  new  idea  of  constructing  a  database  of  pre¬ 
examined  prostate  tissues  and  providing  similar  tissue  samples  with  pathologists  from  the 
database  while  they  examine  an  unknown  tissue  sample.  To  our  knowledge,  no  such  system 
currently  exists.  Further,  we  propose  that  our  system  may  or  may  not  use  infrared  chemical 
imaging  data  in  comparisons.  Comparing  with  the  pre-examined  tissues  samples,  we  expect  that 
pathologists  to  make  more  consistent  and  accurate  decision.  As  we  build  a  database  of  prostate 
tissue  samples,  we  represent  each  tissue  sample  by  its  morphology.  Given  an  unknown  tissue 
sample,  the  similarities  between  the  unknown  sample  and  the  tissue  samples  in  the  database  are 
measured  based  on  the  morphological  properties,  and  the  most  similar  tissue  samples  are 
retrieved.  The  pathologist  may  indicate  that  certain  matches  were  better  than  others,  resulting  in 
an  updating  of  the  database  and  matching  algorithms  as  needed.  The  updating  may  be  conducted 
in  real-time. 

Work  accomplished: 

Morphological  features  have  been  shown  to  be  able  to  characterize  prostate  tissues  and  can  be 
used  for  the  diagnostic  purpose.  Here,  67  morphological  features,  which  are  based  on  lumens  and 
epithelial  nuclei,  were  extracted  from  each  tissue  sample.  The  database  stores  the  morphological 
features  for  the  tissue  samples  which  have  already  been  examined  by  pathologists. 

Once  we  have  an  unknown  prostate  tissue  sample  (query),  first  of  all,  the  morphological  features 
are  extracted  from  the  tissue  sample.  Secondly,  the  similarities  between  the  query  and  the  tissue 
samples  in  the  database  are  computed  using  Euclidean  distance  based  on  the  morphological 
features.  Lastly,  the  most  similar  k  tissue  samples  to  the  query  are  retrieved  from  the  database. 


To  assess  the  goodness  of  the  method,  we  have  tested  our  method  on  a  dataset  composed  of  181 
tissue  samples.  In  the  dataset,  5,  23,  66,  and  21  tissue  samples  are  Gleason  grade  2,  3,  4,  and  5 
cancer  (“ Cancer  ”),  respectively,  and  20  and  46  tissue  samples  are  BPH  and  normal  (“ Benign  ”), 
respectively.  Due  to  the  small  number  of  tissue  samples,  Gleason  grade  2  is  ignored  for  the 
further  consideration.  As  mentioned  above,  each  of  tissue  samples  is  represented  by  67 
morphological  features. 

In  order  to  measure  the  performance  of  the  method,  we  adopted  ^-nearest  neighbor  (kNN) 
algorithm  and  predicted  the  grade  of  the  query  by  majority  voting.  Both  accuracy  and  kappa- 
coefficient  were  computed  for  the  predictions.  Since  pathologists  may  be  more  interested  in 
grading  of  cancerous  tissue  samples,  we  also  applied  our  method  only  to  the  “Cancer”  tissue 
samples;  i.e.,  Gleason  grade  3,  4,  and  5  samples. 

We  performed  Leave-one-out  (LOO)  cross-validation  on  the  dataset.  LOO  leaves  one  example  as 
a  validation  data  and  uses  the  remaining  examples  as  training  data.  In  our  method,  the  validation 
data  is  the  query,  and  the  training  data  is  regarded  as  the  database.  It  should  be  noted  that  the 
number  of  tissue  samples  in  each  grade  in  the  dataset  varies.  The  imbalance  in  the  dataset  could 
affect  the  prediction  made  by  kNN  algorithm.  To  tackle  the  problem,  we  randomly  selected  the 
same  number  of  tissue  samples  from  each  grade  and  performed  LOO  on  the  sub-dataset.  This 
repeated  100  times,  and  the  average  accuracy  and  kappa-coefficient  were  computed  over  the 
repeats. 

Our  method  is  subject  to  the  choice  of  the  number  of  nearest  neighbors  to  consider  for  the 
prediction  and  the  number  of  features  to  use  for  the  similarity  computation.  To  examine  the 
effect  of  them,  we  computed  the  average  accuracy  and  kappa-coefficient  over  100  repeats  as 
increasing  the  two  factors  (Fig.  1).  The  accuracy  decreases  as  increasing  the  number  of  nearest 
neighbors,  and  the  more  features  we  use,  the  higher  accuracy  achieved.  The  highest  average 
accuracy  achieved  for  grading  both  “ Cancer  ”  and  “ Benign  ”  samples  (i.e,  5  grades)  was  42% 
using  7  features  and  1  nearest  neighbor  (Fig.  la).  By  using  8  features  and  1  nearest  neighbor,  the 
highest  accuracy  of  52%  achieved  for  grading  only  “ Cancer  ”  samples  (i.e.,  3  grades)  (Fig.  lc). 
Both  cases  also  achieved  the  average  Kappa  coefficient  of  0.27  (Fig.  lb,  d).  In  Fig.  2,  the 
distribution  of  the  grade  of  the  retrieved  samples  is  shown.  Distinction  between  “Cancer”  and 
“Benign  ”  samples  is  obvious  (Fig  2a),  but  among  “Cancer”,  the  retrieved  samples  often  do  not 
belong  to  the  same  grade  with  the  query,  especially  between  Gleason  grade  3  and  4. 
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Figure  11.  Average  accuracy  and  kappa  coefficient,  (a),  (b)  grading  for  both  “Cancer”  and 
“Benign  ”  samples,  (c),  (d)  grading  for  “Cancer”  samples.  Each  line  depicts  the  accuracy  and 
kappa  coefficient  values  of  the  corresponding  number  of  features. 
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Figure  12.  Distribution  of  the  grade  of  the  retrieved  samples,  (a)  grading  for  both  “Cancer” 
and  “Benign  ”  samples,  (b)  grading  for  “Cancer”  samples.  For  the  samples  in  each  grade,  the 


grade  of  retrieved  samples  are  counted  and  the  average  number  of  samples  are  shown.  The 
arrows  denote  ±1  standard  deviation  of  the  number  of  samples. 


TASK  2G:  DEVELOP  PROTOCOLS  AND  VALIDATE  GLEASON  GRADING  OF 
TUMOR  (MONTHS  18-27) 

The  task  above  provides  details  of  the  development  and  LOO  validation.  More  rigorous 
validations  are  needed  but  the  preliminary  results  shows  here  have  been  used  to  validate  the 
grading  correspondence  and  the  protocols  we  have  developed,  as  noted  above. 

It  is  important  to  place  the  magnitude  of  our  advance  in  context.  Several  research  efforts  have 
been  made  to  develop  automated  systems  for  the  grading  of  prostate  tissues.  The  majority  of 
systems  have  been  used  texture  and/or  morphological  features  to  characterize  and  classify  tissue 
samples  into  correct  classes.  However,  the  information  which  pathologists  will  obtain  by  using 
such  methods  may  be  limited  since  these  only  provide  the  predicted  grade  in  general.  The 
prediction  also  relies  on  the  training  data.  Most  importantly,  these  prior  efforts  always  sought  to 
match  a  sample  completely  to  provide  a  diagnosis,  rather  than  provide  matching  candidates. 
Further,  the  role  of  other  modalities  in  the  process  was  not  clear.  Here,  we  may  also  use  IR 
chemical  imaging  data  in  matching.  Our  premise  is  that  tissue  samples  which  have  the  same 
grade  and  similar  characteristics  and  patterns  with  the  sample  of  interest  will  afford  more 
information  to  pathologists  and  hence,  the  system  enables  a  matching  to  a  database  rather  than 
seeking  to  provide  an  unequivocal  diagnosis. 

Future  outlook  enabled  by  this  progress: 

The  matching  system  would  be  implemented  first  for  a  clinical  trial  and  then,  would  be  ready  for 
commercial  translation.  While  a  true  clinical  trial  is  the  next  step,  some  further  development  of 
the  actual  methods  may  be  expected.  We  have  built  the  method  into  existing  software  as  a  user- 
friendly  software. 


Task  3.  Develop  mathematical  framework  to  correlate  spectral,  spatial  and  clinical 
parameters  with  cancer  progression 

a.  Identify  and  validate  spectral  metrics  and  develop  spatial  metrics  indicative  of  tumor 
grade  (Months  27-30) 

b.  Develop  prediction  algorithm  for  predicting  outcome  (Months  30-36) 

Activities: 

We  have  imaged  460  patients  with  full  outcome  data  and  identified  several  metrics  that  are 
indicative  of  tumor  grade  (please  see  task  2  as  well).  A  mathematical  framework  for  correlate  the 
spectral,  spatial  and  clinical  parameters  with  cancer  progression  has  been  built  using  logistic 
regression.  The  prediction  algorithm  is  available  for  use  and  will  be  validated.  The  task  for  this 
project  was  to  develop  the  algorithms  and,  hence,  the  task  is  complete. 


Task  4.  Write  reports  and  finalize  algorithms  into  software  (Months  33-36) 


A  number  of  reports  (invention  disclosure,  conference  etc.)  have  been  written  and  manuscripts 
based  on  this  work  have  been  submitted  and  have  been  printed  as  detailed  in  the  following 
sections. 

In  summary,  the  promised  work  has  been  accomplished  to  a  reasonable  degree  and  has  opened 
up  significant  doors  to  future  progress  in  prostate  pathology  as  a  research  direction  as  well  as  for 
patients  and  clinicians. 

Key  Research  Accomplishments  • 

A  genetic  algorithm  based  method  to  distinguish  benign  from  malignant  epithelium  using  infrared 
spectroscopic  imaging  data  was  shown  to  be  effective.  Large  scale  validation  shows  promising  results  and 
a  manuscript  is  being  written. 

•  We  determined  that  one  of  the  key  factors  in  understanding  our  data  was  the  spatial  structure  of  the  tissue, 
that  closely  affected  the  IR  data.  A  series  of  simulations  were  conducted  after  developing  a  rigorous  optical 
model  to  predict  distortions.  Results  are  reported  in  two  manuscripts  in  Anal.  Chem. 

•  A  combination  of  IR  and  conventional  pathology  imaging  has  been  developed  and  extensively  validated. 
The  manuscript  has  been  submitted  to  BMC  cancer. 

•  A  method  to  correlate  Gleason  grades  with  measured  data  has  been  developed.  Larger  validation  studies 
are  needed.  An  invention  disclosure  has  been  filed  and  a  patent  will  be  filed  soon.  Subsequently,  we  will 
submit  a  manuscript  based  on  the  work  to  Cancer  Research. 


Reportable  Outcomes . 
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Conclusion . 

The  work  accomplished  demonstrates  clear  potential  and  protocols  for  classifying  prostate  tissue. 

If  the  protocols  are  validated  in  on-going  larger  studies  and  translated  to  the  clinic,  a  new  tool  for 
prostate  histopathology  will  be  available  for  pathologists  and  benefits  will  be  realized  by 
patients. 

So  What  Section 

An  automated  method  to  assist  prostate  pathologists  is  available  and  can  rapidly  determine  the 
presence  of  cancer  in  biopsies.  An  automated  aid  to  grading  is  available  to  aid  pathologists  in 
making  accurate  decisions.  Clinical  translation  of  these  discoveries  can  directly  improve  prostate 
healthcare,  resulting  in  better  treatment  of  individuals. 
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Abstract  Fourier  transform  infrared  (FTIR)  chemical  im¬ 
aging  is  a  strongly  emerging  technology  that  is  being 
increasingly  applied  to  examine  tissues  in  a  high-throughput 
manner.  The  resulting  data  quality  and  quantity  have 
permitted  several  groups  to  provide  evidence  for  applica¬ 
bility  to  cancer  pathology.  It  is  critical  to  understand, 
however,  that  an  integrated  approach  with  optimal  data 
acquisition,  classification,  and  validation  is  necessary  to 
realize  practical  protocols  that  can  be  translated  to  the  clinic. 
Here,  we  first  review  the  development  of  technology 
relevant  to  clinical  translation  of  FTIR  imaging  for  cancer 
pathology.  The  role  of  each  component  in  this  approach  is 
discussed  separately  by  quantitative  analysis  of  the  effects 
of  changing  parameters  on  the  classification  results.  We 
focus  on  the  histology  of  prostate  tissue  to  illustrate  factors 
in  developing  a  practical  protocol  for  automated  histopa¬ 
thology.  Next,  we  demonstrate  how  these  protocols  can  be 
used  to  analyze  the  effect  of  experimental  parameters  on 
prediction  accuracy  by  analyzing  the  effects  of  varying 
spatial  resolution,  spectral  resolution,  and  signal  to  noise 
ratio.  Classification  accuracy  is  shown  to  depend  on  the 
signal  to  noise  ratio  of  recorded  data,  while  depending  only 
weakly  on  spectral  resolution. 
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Introduction 

Cancer  is  one  of  the  leading  causes  of  death  in  the  western 
world  and  is  becoming  increasingly  prevalent  worldwide.  It 
is  well  established  that  appropriate  therapy  for  cancers 
diagnosed  early  generally  leads  to  improved  prognosis  and 
longer  survival.  Consequently,  population  screening  tests  to 
detect  disease  are  increasingly  being  deployed.  The 
emphasis  in  screening  populations  is  on  obtaining  a  high 
sensitivity  through  simple  diagnostic  tests.  For  example,  the 
prostate-specific  antigen  (PSA)  assay  [1]  helps  triage 
persons  at  risk  for  prostate  cancer.  A  cutoff  level  (typically 
4  ng  mL_1)  or  increase  in  PSA  velocity  implies  that  the 
screened  person  should  be  at  heightened  surveillance  and 
typically  undergoes  a  biopsy  to  confirm  disease.  Morpho¬ 
logic  structures  in  biopsied  tissue,  as  diagnosed  by  a 
pathologist,  are  the  only  definitive  indicator  of  disease 
and  form  the  gold  standard  of  diagnosis  [2].  Along  with 
clinical  history,  stage,  and  PSA  values,  pathologic  diagno¬ 
ses  form  a  cornerstone  of  clinical  therapy  and  serve  as  a 
basis  for  a  vast  majority  of  research  activity  [3]. 

Typically,  multiple  samples  are  withdrawn  from  the 
organ  during  biopsy.  Extracted  tissue  samples  are  fixed, 
embedded,  and  sectioned  (typically  to  1-  to  5 -jam  thickness) 
onto  a  glass  slide  for  review.  By  itself,  tissue  does  not  have 
much  useful  contrast  in  optical  brightfield  microscopy. 
Hence,  the  prepared  slide  is  stained  with  dyes.  A  mixture  of 
hematoxylin  and  eosin  (H&E)  is  commonly  employed, 
staining  protein-rich  regions  pink  and  nucleic  acid-rich 
regions  of  the  tissue  blue,  for  example,  as  shown  in  Fig.  1. 
Using  the  contrast,  a  trained  person  can  recognize  specific 
cell  types  and  alterations  in  local  tissue  morphology  that  are 
indicative  of  disease.  In  prostate  tissue,  epithelial  cells  line 
three-dimensional  ducts.  In  two-dimensional  thin  sections, 
thus,  the  cells  appear  to  line  empty  circular  regions  (lumen). 
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Fig.  1  Brightfield  microscopy 
images  of  unstained  {left)  and 
stained  {right)  prostate  tissue 
sections.  Hematoxylin  and  eosin 
(H&E)  stains  provides  contrast, 
allowing  a  trained  person  to 
recognize  epithelial  cells  and 
ductal  structure  (lumen),  while 
ignoring  artifacts  and  confound¬ 
ing  morphologies.  A  trained 
human  can  also  leam  to  robustly 
recognize  patterns  within  lumen 
that  indicate  cancer.  The  scale 
bar  corresponds  to  100  \im 


Tear  Artifact 


Distortions  in  normal  lumen  appearance  provide  evidence 
of  cancer  and  characterize  its  severity  (grade).  The  process 
is  fundamentally  a  manual  pattern  recognition  that  seeks 
to  match  observations  to  known  healthy  or  diseased 
morphologies. 

Manual  examination  of  biopsies  is  very  powerful  in  that 
humans  can  not  only  recognize  disease  generally  but  can 
also  overcome  confounding  preparation  artifacts,  detect 
unusual  cases,  and  recognize  deficiencies  in  diagnostic 
quality  This  capability  of  considering  and  neglecting  fea¬ 
tures  based  on  prior  knowledge  is  crucial  for  accurate  and 
robust  diagnoses.  The  process,  however,  is  time  consuming, 
allows  for  limited  throughput  and,  frequently,  leads  to 
variance  in  subjective  judgments  about  the  disease  severity, 
i.e.,  grade  [4].  As  an  alternative,  computer-based  pattern 
recognition  approaches  to  diagnose  disease  may  provide 
more  accurate,  reproducible,  and  automated  approaches  that 
could  reduce  variance  in  diagnosis  while  proving  econom¬ 
ically  favorable.  Hence,  attempts  have  been  made  to 
characterize  morphology  using  H&E  image  analysis  as 
well  as  biomarkers  to  stain  for  specific  molecular  features. 
Automated  approaches  that  can  rival  human  performance  in 
usual  clinical  settings,  however,  are  still  unavailable. 
Specifically,  the  attributes  of  high  accuracy  and  robust 
applicability  are  lacking. 

The  information  content  of  H&E-stained  images  is 
limited  and  attempts  to  automatically  recognize  structural 
patterns  indicative  of  prostate  cancer,  unfortunately,  have 
not  led  to  clinical  protocols.  Similarly,  probe-based  molec¬ 
ular  imaging  can  provide  exquisite  information  regarding 
the  location  and  content  of  specific  epitopes  but  is  limited 
by  complex  diseases  not  expressing  universally  the  same 
epitopes  or  panels  of  markers.  Stains  used  can  generally 
detect  one  feature  that  may  aid  diagnosis  (e.g.,  AMACR) 
but  do  not  provide  entire  diagnostic  information  in 
themselves.  An  exciting  alternative  is  emerging  in  the  form 
of  chemical  imaging  and  microscopy  [5].  As  opposed  to 
conventional  dye-assisted  imaging  or  probe-assisted  molec¬ 
ular  imaging,  chemical  imaging  [6]  seeks  to  directly 
measure  the  identity  and/or  concentration  of  chemical 
species  in  the  sample  using  spectroscopy.  Hence,  no 


molecular  probes  (MPs)  are  needed  to  see  the  presence  of 
specific  epitopes;  instead  computer  algorithms  are  used  to 
extract  information  from  the  data  (instead  of  MP  hybrid¬ 
ization)  and  statistical  methods  are  used  to  provide 
confidence  (as  opposed  to  brown  tints  for  MPs).  The 
approach  is  limited  only  by  the  ability  of  the  technology  to 
sense  specific  types  of  molecules  or  otherwise  resolve 
chemical  species  and  morphologic  structures.  Among  the 
prominent  approaches  are  vibrational  spectroscopic  imag¬ 
ing,  both  Raman  and  infrared  (IR),  as  well  as  mass 
spectroscopic  imaging  (MSI)  [7,  8]  and  magnetic  resonance 
spectroscopic  imaging  (MRSI)  [9].  While  each  technology 
promises  a  specific  measurement  (e.g.,  proteins  or  meta¬ 
bolic  products)  for  specific  situations  (e.g.,  in  vivo  or  ex 
vivo),  IR  spectroscopic  imaging  [10]  is  particularly  attrac¬ 
tive  for  the  analysis  of  tissue  biopsies  in  that  it  permits  a 
rapid  and  simultaneous  fingerprinting  of  inherent  biologic 
content,  extraneous  materials,  and  metabolic  state  [11-14]. 

IR  spectroscopic  imaging,  generally  practiced  using 
interferometry  and  termed  Fourier  transform  infrared 
(FTIR)  spectroscopic  imaging  or,  succinctly,  FTIR  imaging, 
offers  a  particular  combination  of  spatial,  spectral,  and 
chemical  detail  [15].  Limitations  of  FTIR  imaging  include 
coarser  spatial  resolution  compared  to  Raman  imaging  or 
high  powered  optical  microscopy  and  lack  of  specific 
molecular  detail  compared  to  MSI.  Tissue  biopsies  are 
examined  as  thin  sections  on  a  solid  substrate.  The  tissue  is 
dehydrated  and  is  stable  due  to  fixation.  Typically,  struc¬ 
tures  of  pathologic  interest  are  several  to  hundreds  of 
micrometers  in  size,  requiring  fairly  moderate  magnifica¬ 
tions  for  decision  making.  These  conditions  imply  that  the 
need  to  image  in  vivo,  at  exceptionally  high  spatial 
resolution,  or  in  aqueous  environments  is  not  critical  and 
that  standard  pathologic  laboratory  processing  can  be 
employed  for  IR  imaging.  Due  to  the  linear  absorption 
process  being  utilized,  the  signal  from  IR  spectroscopy  is 
large  and  readily  obtained,  promising  relatively  simple 
instrumentation.  Hence,  the  technology  provides  a  platform 
that  is  potentially  useful  for  clinical  practice  in  pathology.  It 
must  be  emphasized  that  no  particular  technology  is  ideally 
suited  to  all  applications  but  a  careful  matching  of  the 
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Fig.  2  Potential  application  of  FTIR  imaging  for  pathology.  The 
current  paradigm  of  cancer  diagnosis  and  grading  upon  biopsy 
involves  sample  processing,  staining,  and  pathologist  review  {left, 
shaded  boxes).  To  implement  the  paradigm  of  automated  analysis 
{right,  unshaded  boxes),  IR  chemical  imaging  is  followed  by 
computer  analysis  for  diagnosis.  Since  IR  imaging  is  label-free  and 
non-perturbing,  the  sample  can  be  stained,  providing  the  pathologist 
with  both  IR  chemical  and  conventional  stained  images 


technique  to  the  application  can  lead  to  useful  protocols. 
While  the  potential  advantages  of  FTIR  imaging  for 
examining  tissue  biopsies  is  high,  practical  protocols  for 
clinical  deployment  are  being  developed  by  many  groups. 

Numerous  recent  reviews  are  available  to  address 
biomedical  applications  of  FTIR  spectroscopy  and  imaging 
[16-20],  especially  related  to  diseases  and  cancer.  These 
reviews  address  instrumentation,  the  applicability  to  various 


systems,  spectroscopic  bases  and  classification  algorithms 
for  decision  making,  and  controversial  aspects  in  the 
backdrop  of  the  evolution  of  the  field.  The  commercial 
availability  of  high-fidelity  FTIR  imaging  instruments, 
advances  in  computers  and  data  analysis  algorithms,  and 
increasing  interest  have  combined  to  generate  an  increasing 
volume  of  studies.  At  the  same  time,  there  is  considerable 
debate  emerging  on  various  aspects  of  the  process.  Reports 
study  a  variety  of  organs  that  may  not  correlate  in  behavior, 
utilize  different  sample  acquisition  and  processing  tech¬ 
niques,  employ  different  instrumentation,  data  acquisition, 
or  handling  protocols,  and  apply  a  variety  of  decision¬ 
making  algorithms.  While  this  has  led  to  a  lively  community 
of  practitioners  and  exploration  of  various  facets  such  as 
resolution,  biological  diversity,  and  chemometric  or  statis¬ 
tical  methods,  studies  have  generally  focused  on  one  aspect. 
Many  excellent  studies  have  developed  each  of  these 
aspects  to  the  point  of  routine  use  in  advanced  laboratory. 
The  focus  in  the  field  is  now  on  understanding  biochemical 
signals  and  developing  protocols  from  high  quality  data  that 
can  actually  lead  to  clinical  acceptance.  We  contend  that  the 
development  of  clinical  protocols  is  necessarily  integrative 
and,  in  this  manuscript,  review  first  the  salient  aspects  in 
developing  a  practical,  integrative  approach  to  spectroscopic 
imaging  for  cancer  histopathology.  Second,  we  discuss  the 
issues  of  spatial  selectivity,  sample  size  calculations, 
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Fig.  3  Correspondence  of  conventionally  stained  and  FTIR  chemical 
images  for  pathology  applications,  a  Hematoxylin  and  eosin  (H&E)- 
stained  image  of  prostate  tissue  section.  Hematoxylin  stains  negatively 
charged  nucleic  acids  (nuclei  &  ribosomes)  blue,  while  eosin  stains 
protein-rich  regions  pink.  The  diameter  of  the  sample  is  ca.  500  pm. 
Simple  univariate  plots  of  specific  vibrational  modes  provides  for 
enhancement  or  suppression  of  specific  cell  types,  b  Absorption  at 


rich  epithelial  cells  in  the  manner  of  hematoxylin,  c  Spatial 
distribution  of  a  protein-specific  peak  (ca.  1,245  cm-1  )  highlights 
differences  in  the  manner  of  eosin.  The  entire  spectrum  can  be 
analyzed  for  a  series  of  markers  that  provide  more  information  than 
H&E  or  univariate  images,  as  shown  in  d  where  specific  cells  are 
color  coded  based  on  their  spectral  features  (e) 
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optimization  considerations,  and  potential  improvements  in 
algorithms  that  can  provide  faster  results.  Tests  to  determine 
performance  and  limits  of  accuracy  are  reported  as  a 
function  of  experimental  parameters.  We  focus  here  on 
prostate  histology  as  an  illustrative  test  case,  but  emphasize 
that  the  approach  is  applicable  and  similar  insight  is  gained 
with  other  tissues  [21].  Further,  exciting  results  have 
recently  been  reported  for  diagnosis,  grading,  and  classifi¬ 
cation  of  prostate  cancer  [22-26],  including  the  effects  of 
zonal  anatomy  [27]  and  cytokinetic  activity  on  spectra  [28]. 
An  extension  of  the  methodology  here  to  pathology  will 
help  formulate  better  protocols  and  allow  a  better  under¬ 
standing  of  the  performance  of  classifiers. 

Approach  and  essentials 

The  promise  of  chemical  imaging  for  pathology  is 
illustrated  in  Fig.  2.  Our  approach  has  been  to  attempt 
integration  of  our  developments  with  current  clinical 
practice.  Hence,  we  employ  tissues  that  have  been  biopsied, 
fixed,  embedded,  and  sectioned  as  per  usual  clinical 
protocols.  We  differ  in  the  de-paraffmization  step,  suggest¬ 
ing  a  gentle  wash  with  hexane  and  do  not  stain  the  tissue. 
Additionally,  as  IR  chemical  imaging  only  employs  benign 
light,  it  is  non-perturbing  and  entirely  compatible  with  all 
downstream  pathology  processes.  Hence,  the  sample  may 
be  stained  as  usual  (Fig.  2,  dashed  arrow,  top).  Visual¬ 
izations  similar  to  those  observed  in  conventional  pathol¬ 
ogy  are  possible  without  staining  the  tissue.  For  example, 


Fig.  3  correlates  H&E  and  infrared  spectral  images. 
Visualizations  similar  to  H&E  images  may  be  “dialed-in” 
by  utilizing  specific  spectral  features  indicative  of  tissue 
chemistry.  Although,  the  IR  data  only  demonstrate  univar¬ 
iate  representations  in  the  images,  automated  mathematical 
algorithms  can  determine  the  cell  types  and  their  locations 
within  the  image,  while  providing  quantitative  measures  of 
accuracy  and  statistical  confidence  in  results  [29].  These 
data  may  be  employed  to  directly  provide  diagnoses  or  to 
inform  the  pathologist  (Fig.  2,  dashed  arrow,  bottom), 
helping  them  make  better  decisions.  Since  the  results  are 
images,  information  exchange  between  spectroscopists  and 
clinicians  is  facilitated.  Spectroscopic  analyses  can  poten¬ 
tially  be  fully  automated;  thus,  no  additional  users  need  to 
be  trained  or  knowledge  base  acquired  by  current  clinicians. 

A  major  challenge  in  the  field  is  the  development  of 
robust  algorithms  that  employ  spectral  data  to  provide 
histopathologic  information.  Both  supervised  and  unsuper¬ 
vised  approaches  have  been  employed.  We  believe  that 
unsupervised  methods  are  more  suited  to  research  and 
discovery.  Supervised  methods  are  preferred  when  the  data 
need  to  be  related  to  known  conditions,  e.g.,  clinical 
diagnoses.  The  development  of  supervised  classification 
of  IR  chemical  imaging  data  for  histopathology  is  fairly 
straightforward  [30].  The  process  is  shown  in  Fig.  4.  First,  a 
model  for  classification  is  selected.  The  model  comprises 
all  possible  outcomes  for  any  pixel  in  the  images  and  is, 
hence,  bounded  by  definition.  We  term  each  histologic 
constituent  of  the  model  a  class  to  denote  that  it  may  not 
correspond  to  specific  cell  types  or  entities  corresponding 


Fig.  4  Process  for  relating  path¬ 
ologic  or  physiologic  state  to 
FTIR  chemical  imaging  data.  A 
model  is  chosen  for  supervised 
classification  (a),  b-d  Training 
data  is  reduced  in  size  and 
optimized  into  a  prediction 
algorithm  using  gold  standard 
data.  The  developed  algorithm 
is  validated  against  a  second, 
independent  data  set  and  the 
accuracy  is  measured  using 
three  different  methods:  ROC 
curves,  confusion  matrices,  and 
image  comparisons 
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to  morphology-based  pathology.  While  this  allows  for 
simplifications  and  allows  the  user  to  focus  on  specific 
cells  relevant  in  disease,  it  is  also  likely  to  prove  useful  in 
the  discovery  of  different  chemical  entities  that  appear 
morphologically  identical. 

Next,  data  from  a  large  number  of  tissue  samples  is 
recorded.  A  set  of  pixels  are  specifically  marked  (gold 
standard)  by  different  colors  to  correspond  to  known 
regions  of  tissue,  usually  by  comparison  with  an  H&E- 
stained  image  or  with  immunohistochemically  stained 
images  [21].  The  recorded  data  set  is  reduced  to  a  smaller 
set  of  measures  that  capture  the  classification  capability  of 
the  entire  data  set.  We  termed  these  measures  metrics. 

There  are  numerous  means  of  obtaining  the  metric  data 
set:  manual  selection  of  large  spectral  regions,  principal 
components  analysis,  genetic  algorithms,  or  a  sequential 
forward  selection  algorithm.  A  numerical  algorithm  is  then 
chosen,  for  example,  a  linear  discriminant  analysis,  neural 
network,  SIMCA,  or  modified  Bayesian  classifier  [31].  The 
classifier  is  optimized  iteratively,  if  needed,  to  optimally 
predict  the  training  data  set.  Subsequently,  the  algorithm  is 
applied  to  a  second  data  set  (independent  validation)  that 
has  been  independently  marked  for  each  class.  A  compar¬ 
ison  of  the  gold  standard  marking  with  the  computation¬ 
ally  predicted  class  provides  a  measure  of  the  accuracy. 

We  have  employed  three  measures  of  accuracy:  receiver 
operating  characteristic  (ROC)  curves  [32]  that  represent 
the  sensitivity  and  specificity  trade-off  of  the  classifier, 
confusion  matrices  that  provide  the  fraction  of  pixels  of 
each  class  classified  as  pixels  of  all  classes,  and  classified 
images  that  can  be  compared  pixel-for-pixel  to  other 
images.  Additionally,  it  is  often  instructive  to  drill  into  the 
classifier  to  obtain  the  basis  for  classification  or  the 
distribution  of  confidence  intervals  for  various  samples. 

The  last  two  factors  are  generally  not  apparent  in  previous 
studies. 

There  are  three  key  developments  that  are  needed  for 
this  approach  to  be  successful:  (a)  high-fidelity  FTIR 
imaging  instrumentation,  (b)  high-throughput  sampling,  ^ 
and  (c)  robust  classification  that  provides  statistically 
significant  results  in  a  manner  that  can  be  appreciated  by 
non-experts  in  spectroscopy.  We  briefly  review  the  three 
developments  next. 

FTIR  imaging 

Need  for  spatially  resolved  data 

The  need  for  spatially  resolved  data  has  been  recognized 
[33],  but  the  effect  of  limited  resolution  data  on  classifica¬ 
tion  is  not  entirely  clear.  The  primary  complication  of 
coarse  spatial  resolution,  obviously,  arises  from  boundary 


pixels.  These  can  be  defined  as  pixels  that  are  assigned  to 
one  class  but  would  likely  yield  more  classes,  to  their 
physical  limits,  were  finer  resolution  available.  As  a 
consequence,  the  spectral  content  of  the  boundary  pixel  is 
likely  to  be  mixed  and  will  likely  lead  to  errors  in 
classification.  For  example,  the  confounding  contribution 
of  stromal  spectra  to  cancerous  epithelial  cells  in  breast 
tissue  has  been  proposed  [34].  As  the  resolution  becomes 
coarser,  the  fraction  of  pixels  in  an  image  that  belong  to 
boundary  pixels  increases.  Inclusion  of  these  pixels  has 
been  shown  to  be  a  primary  contributor  to  error  rates  in  data 
[29],  while  their  exclusion  in  accounting  for  accuracy 
necessarily  implies  that  not  all  pixels  are  included.  We 
sought  to  examine  the  effect  of  spatial  resolution  on  the 
prevalence  of  boundary  pixels. 

We  binned  data  acquired  at  6.25-pm  pixel  size  from  148 
samples  in  a  validation  data  set  (-7000  pixels/sample)  to  10-, 
15-,  20-,  30-,  and  50- pm  pixel  sizes.  There  is  an  important 
distinction  between  pixel  size  and  spatial  resolution.  The 
pixel  size  denotes  the  best  possible  optical  resolution,  which 
may  be  limited  by  longer  wavelengths  in  the  spectrum  and 
optical  effects  to  yield  a  poorer  measured  resolution  [35-38]. 
For  each  dataset,  we  classified  the  tissue  and  determined 
neighbors  of  each  pixel  that  did  not  belong  to  the  class  of 
the  pixel.  Some  pixels  that  have  no  neighbors  of  other 
classes  may  still  have  empty  pixels  as  neighbors.  Since 
neighboring  empty  pixels  can  only  provide  optical  distor¬ 
tion  [39]  but  do  not  affect  spectral  content;  we  do  not 
consider  them  further.  The  number  of  neighbors  for 
epithelial  pixels  for  different  spatial  resolutions  may  be 
seen  in  Fig.  5.  The  first  observation  is  that  a  large  majority 
of  pixels  have  the  same  class  pixels  as  all  eight  neighbors. 
The  fraction  of  pixels  with  all  neighbors  of  the  same  class 


Neighbors  of  Other  Class 

Fig.  5  Neighbors  of  cell  types  other  than  epithelium  or  empty  space 
for  different  spatial  resolutions.  The  inset  shows  the  decrease  in 
percent  epithelial  pixels  that  do  not  have  any  other  cell  types  as 
neighbors 
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decreases  rapidly  with  decreasing  resolution  and  stabilizes 
at  ca.  20  \im.  Hence,  a  spatial  resolution  coarser  than 
20  \im  is  unlikely  to  have  an  effect  on  the  classification  but 
is  expected  to  lead  to  about  25%  more  epithelial  pixels 
being  contaminated  compared  to  6. 25-qm  pixel  sizes.  The 
precise  effect  on  a  specific  sample  is  very  dependent  on  the 
sample  morphology  and  is  generally  associated  weakly 
with  pathologic  state.  While  in  itself,  the  statistic  does  not 
imply  that  results  from  coarser  resolution  studies  will  be 
invalid,  practitioners  must  recognize  that  error  rates  may  be 
higher  and  that  this  contribution  may  be  mitigated  by  using 
commonly  available  imaging  systems. 

One  danger  of  classifying  mixed  composition  pixels  is 
whether  they  may  be  classified  as  an  entirely  different  class 
or  disregarded  from  the  data  set  as  belonging  to  no  class. 
We  simulated  pixels  of  composition  ranging  from  0  to 
100%  for  pairs  of  each  class.  We  also  added  noise  to 
simulate  different  data  acquisition  conditions.  An  example 
of  the  data  can  be  seen  in  Fig.  6.  Average  spectra,  one  each 
from  the  two  classes,  are  baselined  and  added  in  ratios 
varying  linearly  from  0  to  100%.  Figure  6b  demonstrates 
the  classification  of  the  gradient  data  set.  In  general,  the 
classification  works  well,  favoring  the  class  with  higher 
concentration.  The  classifier  is  also  stable  at  the  noise  levels 
examined.  A  surprising  result  is  that  pixels  between 
epithelium  and  fibroblast-rich  stroma  are  classified  as 
mixed  stroma.  This  drawback,  however,  is  the  only 
example  of  two  classes  mixing  to  yield  an  entirely  different 
one.  The  reason  also  stems  from  the  definition  of  the  mixed 
stroma  class.  While  the  class  was  designed  to  handle  those 


stromal  cells  that  were  not  clearly  fibroblasts  or  smooth 
muscle  in  origin  but  appeared  mixed,  a  mix  of  epithelium  and 
fibroblast-type  stroma  also  leads  to  the  classification  as  mixed 
stroma.  Noise  seems  to  have  little  effect  on  this  behavior. 

The  full  simulation  of  all  classes  (not  shown)  reveals  that 
mixed  pixels  generally  can  be  classified  as  the  constituent 
classes  with  the  higher  concentration.  Clearly,  boundary 
pixels  at  epithelial  fibroblast-rich  regions  must  be  handled 
with  care.  The  increase  in  boundary  pixels  at  lower 
resolution  also  implies  that  this  type  of  systematic  mis- 
assignment  may  arise  more  frequently.  The  rate  of 
occurrence  of  boundary  pixels  may  be  even  lower  for 
synchrotron-based  imaging  that  is  conducted  at  higher  pixel 
density  or  in  emerging  approaches  that  utilize  synchrotron- 
based  interferometers  and  array  detectors.  The  simulated 
example  above,  however,  demonstrates  that  simply  over¬ 
sampling  a  spatial  region  to  increase  pixel  density  may 
allow  for  better  definition  of  the  interface  and  assignment 
of  pixels,  though  it  will  not  address  spectral  purity.  Hence, 
for  analyses  based  on  spectral  discrimination,  mixture 
models  will  have  to  be  developed  based  on  entire  spectra. 
For  example,  multivariate  curve  resolution  techniques  hold 
promise. 

A  further  complication  arises  in  using  data  from  his¬ 
tologic  classification  for  pathologic  diagnoses.  For  exam¬ 
ple,  the  boundary  epithelial  pixels  classified  above  may 
disproportionately  contribute  to  classification  errors.  We 
have  found  evidence  for  the  same  in  studies  for  both  cancer 
pathology  and  for  histology  in  tissue  from  different  organs. 
For  example,  the  boundary  pixels  in  benign  tissue  get 


Fig.  6  Mixture  models  and 
classification  for  prostate  histol¬ 
ogy.  a  Absorbance  at 
1,080  cm-1  for  three  classes  and 
their  mixtures.  The  first  column 
contains  mixtures  of  epithelial 
cell  spectra  with  the  average 
spectrum  from  fibroblast-rich 
stroma  and  mixed  stroma. 

The  second  and  third  columns 
contain  mixtures  with  fibroblast- 
rich  and  mixed  stroma,  respec¬ 
tively.  The  concentration 
changes  from  0  to  100%  linearly 
along  the  y-direction  as  indicat¬ 
ed  by  the  color  bar  in  c.  b 
Along  the  x-axis  of  the  com¬ 
posite  image,  the  noise  in  each 
cell  increases  linearly.  Error 
bars  are  standard  deviations  of 
noise  in  the  spectra,  c  Classified 
image  for  the  data,  demonstrat¬ 
ing  the  effect  of  composition 
and  noise  on  classification,  d 
Probability  profiles  of  the  three 
cell  types  at  columns  1  and  25, 
demonstrating  the  effect  of  noise 
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Table  1  Correlation  of  composition  for  samples  between  6.25-mm  pixel  sizes  and  other  pixel  sizes 


Pixel  size  (micron) 

Epithelium 

Fibroblast-rich  stroma 

Mixed  stroma 

10 

0.9913x(0.9976) 

0.9847x(0.9923) 

1.0300x(0.9957) 

20 

1.0156x(0.9906) 

0.9671  x(0.9775) 

1.0473x(0.9787) 

25 

1.0404x(0.9896) 

0.9768x(0.9624) 

1.0262x(0.9617) 

30 

1.0720x(0.9773) 

0.9683x(0.9507) 

1.0175x(0.9363) 

50 

1.1180x(0.9459) 

0.9410x(0.8947) 

1.0390x(0.8723) 

The  first  row  in  each  cell  denotes  the  composition  factor  for  that  pixel  size  and  class.  For  example,  for  every  100  pm2 ,  the  area  of  epithelial  pixels 
at  10- pm  pixel  size  is  99.13%  of  that  at  6.25-pm  pixel  size.  Increasing/decreasing  numbers  represent  pixels  being  increasingly/decreasingly 
classified  as  that  class.  The  ratios  are  not  uniform  for  every  sample  and  the  regression  coefficient  of  the  best  fit  line  passing  through  the  origin  is 
provided  in  the  second  row  of  the  each  table  cell.  Increasing  pixel  sizes  reflect  greater  variance  from  the  fit  line 


misclas sifted  as  cancerous,  leading  to  the  major  source  of 
error  in  applying  this  approach  to  pathology.  At  this  time, 
the  evidence  is  anecdotal  and  needs  further  investigation  to 
quantify  the  extent  of  the  error  and  its  mitigation  by 
advanced  numerical  processing.  The  last  interesting  aspect 
of  lower  spatial  resolution  is  that  it  tends  to  over-predict 
certain  classes.  For  example,  Table  1  demonstrates  the 
regression  results  of  each  samples  composition  against  that 
obtained  at  6.25  pm  for  three  classes.  While  the  regression 
coefficient  is  high,  it  is  clear  that  epithelial  and  mixed 
stroma  fractions  are  overestimated  and  fibroblast-rich 
stroma  is  underestimated  with  decreasing  pixel  size.  There 
are  differences  based  on  underlying  pathology.  For  exam¬ 
ple,  normal  epithelium  is  generally  encountered  in  10-  to 
40- pm- wide  strips,  while  high  grade  tumor  may  be 
hundreds  of  micrometers  to  millimeters  in  size.  Individual 
sample  variability  reflected  in  the  regression  coefficient 
decreases  with  increasing  pixel  size.  In  spectroscopic 
models  to  predict  diseases  that  include  morphological  units 
but  are  based  on  average  spectra,  mixed  pixels  may  lead  to 
estimates  with  large  errors.  For  example,  a  1:1  mixed  region 
of  epithelial  and  fibroblast  pixels  at  6.25- pm  pixel  size 
increases  to  ca.  1.19:1  for  50- pm  pixel  size.  Hence,  the  use  of 
histologic  mixture  models  at  limited  spatial  resolution  may 
not  be  estimated  correctly,  providing  evidence  that  the 
percentage  content  of  cell  types  in  a  limited  field  of  view  is 
likely  to  be  a  less  robust  measure  of  tissue  histopathology. 

Evolution  and  capabilities  of  current  instrumentation 

To  overcome  confounding  by  mixing,  as  discussed  above, 
microscopectroscopy  was  proposed  as  an  alternative  [40]. 
Single  spectra  (non-FTIR)  have  been  recorded  from 
microscopic  samples  for  over  50  years  [41]  by  restricting 
light  incident  on  the  sample  through  an  aperture.  More  than 
one  point,  however,  is  required  for  tissue  analysis.  Hence, 
sequentially  rastering  the  point  at  which  spectra  are 
recorded,  termed  mapping  or  point  microscopy,  was 
proposed  [42].  A  practical  instrument  obtained  by  coupling 
an  interferometer,  a  microscope,  and  automated  stage  in  the 


late  1980s  [43]  helped  in  numerous  materials  science  [44], 
forensic  [45],  and  biomedical  [46,  47]  studies.  Unfortu¬ 
nately,  the  mapping  approach  has  a  number  of  drawbacks  in 
realizing  the  goal  of  an  FTIR  microscopy  analog  to  optical 
microscopy  [48]. 

More  than  85%  of  cancer  arises  in  epithelial  cells,  which 
often  form  surface  layers  that  are  10-  to  100- pin  wide.  As 
we  demonstrated  in  the  previous  section,  however,  a 
resolution  higher  than  ca.  10x10  pm  is  preferable. 
Consequently,  the  illuminated  spot  at  the  sample  has  to  be 
made  smaller,  throughput  decreases  proportionally,  which 
in  turn  decreases  the  signal  to  noise  ratio  (SNR)  of  acquired 
spectra.  Orders  of  magnitude  brighter  sources,  e.g.,  syn¬ 
chrotrons,  may  be  employed  to  recover  the  lost  SNR. 
Unfortunately,  synchrotron  or  free  electron  lasers  [49]  are 
prohibitively  expensive  and  no  laboratory  lasers  exist  for 
the  wide  spectral  region.  An  alternative  is  to  average 
successive  measurements  (co-adding)  to  increase  statisti¬ 
cally  the  SNR.  Since  the  SNR  increases  only  as  the  square 
root  of  the  number  of  averaged  spectra,  long  averaging 
periods  are  required.  The  situation  may  be  mitigated  by 
using  higher  condensing  optics,  sources  at  higher  temper¬ 
atures,  slightly  faster  scanning  than  used  here,1  gain 
ranging  [50],  or  ultra-sensitive  detectors  [51].  Even  if  a 
hypothetical  instrument  with  all  these  advances  were 
constructed,  ca.  10-  to  20-fold  reduction  in  time  would  be 
obtained.  Furthermore,  this  calculation  underestimates  the 
time  required  by  not  considering  losses  due  to  diffraction  or 
stage  movement. 

In  prostate  tissue,  for  example,  the  situation  is  similar  to 
Fig.  1.  Epithelial  cells  form  10-  to  35-pm-wide  foci  around 
the  cross-sections  of  ducts.  Ducts  appear  as  white  circles  in 
Fig.  lb,  surrounded  by  epithelial  cells  that  are  depicted  in 
blue.  To  analyze  this  morphology,  aperture  dimensions  of 
ca.  6  pmx6  pm  (~  cell  size)  are  proposed  [31];  for  this 
case,  the  mapping  approach  would  require  ca.  1,028  h  for  a 


1  There  is  no  advantage  to  faster  scanning  once  the  modulation 
frequency  has  reached  optimum  level  for  MCT  detectors  (1  MHz). 
The  reduced  time  to  observe  signal  then  decreases  the  SNR. 
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500  qmx500  \im  sample  [31].  Hence,  mapping  is  not  a 
viable  option.  In  contrast  to  point  mapping  using  apertures, 
large  fields  of  view  are  measured  in  FTIR  imaging. 
Contributions  from  different  sample  areas  in  imaging  are 
separated  by  an  array  of  mid-IR-sensitive  detection  ele¬ 
ments  in  the  manner  of  imaging  with  CCD  devices  for 
optical  microscopy.  By  coupling  the  multichannel  detection 
of  focal  plane  array  (FPA)  detectors  with  the  spectral 
multiplexing  advantage  of  interferometry,  an  entire  sample 
field  of  view  is  spectroscopically  imaged  in  a  single 
interferometer  scan  [52].  Depending  on  the  microscopy 
configuration,  thousands  of  moderate  resolution  spectra  can 
be  acquired  at  near-diffraction-limited  spatial  resolution  in 
minutes  [53,  54].  The  time  advantage  over  mapping  is 
nominally  the  number  of  pixels  in  the  FPA  (16-  to  65,000- 
fold)  but  the  noise  characteristics  of  FPAs  are  poorer  than 
sensitive  single  point  detectors  [55].  Hence,  the  SNR- 
normalized  advantage  is  lower  [56].  Faster  detectors  are 
being  used  for  imaging  and  promise  significantly  higher 
SNR  in  the  same  time.  For  example,  we  have  employed  a 
128x128  element  MCT  array  operating  at  ca.  16  kHz  to 
acquire  a  full  data  set  in  ca.  0.07  s  [unpublished].  These 
rates  of  data  acquisition  are  approximately  a  factor  of  10 
higher  than  commercially  available,  but  are  required  for 
practical  data  acquisition  times.  Increase  in  data  acquisition 
speed  remains  a  bottleneck  for  applications  of  IR  imaging 
to  routine  clinical  studies.  Coupled  with  the  complexity  and 
cost  of  instrumentation,  present  technology  provides  pre¬ 
liminary  capability  but  is  likely  to  prove  a  barrier  to 
practical  clinical  translation. 

High-throughput  sampling  and  statistical  pitfalls 

Quantitative  analyses  of  results 

The  best  imaging  instruments  (which  employ  sensitive 
detectors  and  a  small  multichannel  advantage)  can  acquire 
data  in  about  0.1%  of  the  time  required  for  mapping  for 
equivalent  parameters.  Hence,  point  mapping  studies  in 
pathology  typically  exceed  numbers  in  only  one  of  these 
categories:  spatial  resolution  (ca.  15-20  pm),  numbers  of 
patients  (ca.  50)  or  recorded  small  numbers  of  spectra  per 
patient  (ca.  100).  These  numbers  may  typically  be  improved 
an  order  of  magnitude  with  imaging.  For  example,  a  recent 
report  analyzed  ca.  ten  million  spectra  from  ca.  1,000 
samples  at  a  spatial  resolution  of  6.25  pm  [26].  This 
quantitative  validation  is  necessary  for  any  automated 
biomarker  approach  (vide  infra)  [57].  Studies  are  underway 
in  our  and  other  laboratories  to  correlate  spectral  patterns 
with  other  physiologic  and  pathologic  conditions;  recent 
published  studies  verify  the  robustness  and  potentially  wide 
applicability  of  FTIR  microscopy  [58,  59]. 


Sample  size 

Though  these  studies  demonstrate  potential,  [60,  61] 
considerable  debate  exists  on  reproducibility  and  accuracy 
measures  for  larger  studies  [29].  The  first  response  of  many 
practitioners  to  new  data  is  a  question  of  validity  based  in 
limited  statistical  confidence.  A  detailed  understanding  is 
emerging  from  the  work  of  several  groups  regarding 
appropriate  sample  control  [62]  and  confounding  factors 
due  to  biology  [63].  Inherent  differences  between  patient 
cohorts,  effects  of  sample  preparations  and  measurement 
noise  are  topics  that  can  be  addressed  with  the  available 
imaging  technology  but  are  yet  to  be  fully  explored.  Hence, 
validating  robust  spectral  markers  for  large  sample  pop¬ 
ulations  [64,  65]  is  exceptionally  challenging  and  the 
chance  for  chance  and  bias  influencing  results  exists. 

Most  importantly,  the  fundamental  question  of  sample 
size  required  has  remained  open.  There  are  two  major 
concerns:  first,  the  optimal  sample  size  in  forming  calibra¬ 
tion  sets  and  a  prediction  algorithm.  Second,  investigators 
must  determine  whether  the  results  shown  can  be  supported 
by  statistical  considerations.  While  the  first  problem  is 
essentially  one  of  optimizing  a  model  and  prediction 
algorithm,  the  second  impacts  the  quality  of  results  and 
claims  of  applicability  directly.  In  this  manuscript,  we 
examine  only  the  second  aspect.  Determining  the  optimal 
sample  size  to  form  robust  models  is  a  more  involved 
problem  and  is  discussed  elsewhere.  The  statistical  validity 
of  obtained  results  and  dependence  on  data  acquisition 
parameters  are  discussed  later  in  this  manuscript.  Specifi¬ 
cally,  we  estimate  sample  size  based  on  the  standard  error 
for  the  area  under  the  curve  for  an  ROC  curve. 

Gold  standard 

The  selection  of  pixels  as  gold  standards  needs  great  care.  It 
must  be  done  independently  of  any  classifier  training  or 
validation,  thus  ensuring  a  blinded  study  design.  Once  the 
gold  standard  set  is  determined,  it  must  not  be  changed. 
This  will  ensure  that  there  is  no  bias  in  the  process.  Care 
must  be  taken  to  avoid  pixels  that  do  not  lie  on  the  tissue  or 
those  that  are  at  the  boundary  as  these  may  artificially 
inflate  the  error.  The  use  of  all  pixels  in  an  image  has  been 
suggested  and  their  exclusion  has  been  proposed  to 
contribute  selection  bias.  Selection  bias,  however,  does 
not  arise  in  pixels  that  are  chosen  independent  of  validation 
algorithms.  The  exclusion  of  boundary  pixels  is  necessary 
in  both  training  (to  avoid  spurious  probability  distribution 
functions)  and  validation  (to  prevent  introduction  of  errors). 
There  are  major  technological  difficulties  in  relating  stained 
visible  to  IR  images  from  unstained  tissue  due  to  changes 
during  staining,  leading  to  errors.  Hence,  it  has  been 
proposed  that  the  exclusion  of  boundary  pixels  in  akin  to 
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the  performance  of  a  classifier  with  a  reject  option  for  the 
boundary. 

Sampling,  archiving,  and  consistency 

While  it  is  unclear  what  an  optimal  sample  size  would  be,  it 
is  clear  that  a  large  number  of  tissue  samples  are  needed  for 
effective  validation.  While  it  may  theoretically  be  possible 
to  train  on  a  single  sample,  validation  of  a  protocol  is 
required  on  more  samples.  We  recognized  that  one  does  not 
need  to  observe  the  full  surgically  resected  tumor  for 
validating  IR  protocols,  but  would  need  a  representative 
small  section.  Hence,  we  employed  tissue  microarrays 
(TMAs)  [66]  as  a  platform  for  high-throughput  sampling. 
TMAs  consist  of  a  large  number  of  small  tissue  samples 
arranged  in  a  grid  and  deposited  on  the  same  substrate. 
They  are  typically  manufactured  by  embedding  cylindrical 
cores  in  a  receiving  block  and  sectioning  the  block 
perpendicular  to  the  long  axis  of  the  core.  Thin  sections 
are  then  floated  on  to  a  rigid  substrate  for  analysis.  The 
technique  facilitates  rapid  visualization  of  results  of  any 
classification  protocol,  while  revealing  localization  and 
prevalence  of  any  errors.  Sample  processing  times  may 
easily  be  increased  100-fold,  valuable  tissues  are  optimally 
utilized,  and  consecutive  TMA  sections  can  be  used  to 
correlate  with  staining  results.  Construction  and  analysis  of 
TMAs  has  been  automated,  further  increasing  the  through¬ 
put.  For  spectroscopists,  TMAs  provide  a  ready  source  of 
tissue  to  test  hypothesis  and  develop  prediction  models. 

The  validity  of  employing  TMAs  for  prostate  cancer 
research  and,  especially,  for  cancer  grading  has  been 
addressed  by  a  number  of  authors  [67].  For  example,  a 
study  of  genitourinary  pathologists  [4]  with  images  from 
TMA  cores  assesses  that  ca.  90%  considered  this  approach 
useful  for  resident  training  and  for  pathology  teaching. 
Further,  Gleason  score  was  easily  assigned  to  each  TMA 
spot  of  a  0.6-mm-diameter  prostate  cancer  sample.  Hence, 
the  utility  of  TMAs  is  not  only  in  providing  numerous 
samples  in  a  compact  manner  for  the  advantages  above, 
but  also  in  consistency  of  the  diagnoses  and  precision  in 
analyzing  similar  areas.  Virtual  tissue  microarrays  could 
be  constructed  from  different  areas  of  large  samples,  thus 
providing  many  sub-samples  for  within-patient  and  among- 
patient  comparisons.  This  approach  has  not  yet  been  re¬ 
ported  but  is  likely  a  useful  extension  of  the  TMA  concept. 

Prediction  algorithms  and  high-throughput  data 
analysis 

Univariate  algorithms 

The  major  technological  advances  of  fast  FTIR  microscopy 
and  high-throughput  tissue  sampling  have  been  addressed 


by  imaging  and  TMAs  respectively.  There  is  still  some 
confusion  and  widespread  disagreement,  however,  about 
the  “best”  approach  to  extract  histopathologic  information 
from  FTIR  imaging  data.  Several  early  manuscripts  employ 
univariate  correlations  to  disease  states  [68].  While  the 
results  were  exciting,  it  is  now  realized  that  they  were 
statistically  flawed  and  did  not  necessarily  contain  a 
fundamental  basis  in  cancer  biology.  To  our  knowledge, 
there  is  no  manuscript  that  has  expressly  demonstrated, 
using  statistics  arguments,  why  univariate  analyses  are 
likely  to  fail.  There  is  widespread  consensus  and  anecdotal 
evidence,  however,  among  practitioners  that  argues  against 
the  approach.  Consider  the  distributions  for  a  univariate 
measure  (absorbance  at  1,080  cm-1  that  is  normalized  to 
the  amide  I  peak  height)  for  benign  and  malignant  cases  as 
shown  in  Fig.  7. 

The  normalized  histograms  reveal  that  for  specific, 
single  samples  the  distribution  of  absorbance  at  pixels  is 
such  that  it  clearly  indicates  the  metric  to  be  a  good  one  for 
cancer  discrimination.  When  the  distribution  from  all 
samples  is  considered,  however,  there  is  little  difference  in 
the  distributions.  Hence,  many  univariate  measures  de¬ 
scribed  in  the  literature  do  not  hold  up  in  wide  population 
testing.  A  TMA-based,  high-throughput  validation  can 
easily  prove  that  the  measure  is  not  a  good  one  but  does 
discriminate  some  samples.  In  Fig.  7,  a  cutoff  value  can 
generally  be  found  that  distinguishes  disease,  leading  to  the 
erroneous  conclusion  that  the  feature  is  universally  indic¬ 
ative  of  disease  state.  Since  a  typical  infrared  spectrum  has 
numerous  frequencies  and  even  non-chemically  specific 
features  that  can  provide  discrimination,  a  small  number  of 
samples  increases  the  probability  of  finding  such  discrim¬ 
ination  by  chance  alone.  Univariate  measures  that  appar- 


Absorbance  at  1 080  cnr1  (a.u.) 

Fig.  7  Distribution  of  absorbance  for  individual  spots  and  all  pixels 
from  each  class,  normalized  by  the  total  number  of  pixels  in  the  class, 
demonstrates  that  the  examination  at  patient  level  and  at  a  global  level 
may  not  correspond 
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ently  provide  discrimination  when  none  exists  can  be 
equated  to  the  false  discovery  rate  (FDR)  [69]  of  metrics. 
The  FDR  is  very  different  from  the  p-\ alue  for  determining 
that  a  metric  separates  two  distributions;  a  much  higher 
FDR  can  be  tolerated  than  can  a  />-value.  Similarly,  a  false 
negative  rate  has  been  proposed  [70],  which  is  not  critical 
for  our  case  as  we  have  observed  high  accuracy  without  use 
of  any  erroneously  left  out  metrics.  While  detailed  cal¬ 
culations  and  their  underlying  concepts  are  too  lengthy  to 
reproduce  here,  for  the  sake  of  completeness,  it  suffices  to 
say  that  for  the  expected  number  of  metrics  demonstrating 
discrimination,  the  FDR  tends  to  zero  for  larger  than  ca. 
30  samples.  While  correlations  due  to  chance  can  be  min¬ 
imized  by  this  approach,  there  is  potential  for  unknown 
bias  or  error  in  prediction  for  small  numbers  of  samples. 
Hence  the  algorithm  must  be  integrated  with  sampling 
considerations. 

Multivariate  algorithms 

It  was  argued  in  the  previous  section  that  univariate 
analysis  may  not  provide  a  good  measure  of  the  population 
distribution.  It  can  alternatively  be  argued  that  the  individ¬ 
ual  differences  in  univariate  measures  are  masked  if 
population  measures  of  the  same  are  employed.  Similarly, 
multivariate  techniques  may  mask  the  individual  measures 
in  population  testing.  Hence,  our  philosophy  has  been  to 
employ  a  multivariate,  supervised  classification  in  which 
the  metrics  are  derived  from  univariate  analyses.  This 
enables  us  to  carefully  examine  each  metric  for  both 
population  as  well  as  individual  sample  relevance.  While 
unsupervised  clustering  approaches  provide  good  insight 
into  spectral  similarity,  a  supervised  method  forces  a 
relation  to  common  clinical  knowledge.  For  example,  as 
shown  in  Fig.  4  for  prostate  tissue,  we  consider  a  ten-class 
model  to  determine  histology.  The  drawback  is  that  the 
sensitivity  of  the  approach  to  individual  samples  is  lost  at 
the  expense  of  generality.  One  could  potentially  combine 
clustering  and  supervised  classification.  Clustering  infor¬ 
mation  on  the  training  data  set  would  emphasize  individual 
sample  distributions,  which  would  allow  for  supervised 
classification  tailored  to  each  cluster  type.  Such  an 
approach  has  not  been  implemented  yet  but  is  being 
attempted  in  our  laboratories  to  classify  samples  optimally. 

Dimensionality  reduction 

It  is  well  recognized  that  the  spectrum  at  each  pixel  needs 
to  be  reduced  to  a  smaller  set  of  useful  descriptors  that 
capture  the  essential  information  inherent  in  the  spectrum. 
The  reduction  of  full  spectral  information  to  essential 
measures  helps  eliminate  from  consideration  those  spectral 
features  that  have  no  information  (non-absorbing  frequen¬ 


cies),  little  biochemical  significance  (e.g.,  apparent  absorp¬ 
tion  at  non-chemically  specific  frequencies),  inconsistent 
measures  that  may  degrade  classification,  and  those  with 
redundant  information.  The  number  of  useful  measures  is 
significantly  smaller  than  the  number  spectral  resolution 
elements  and,  hence,  the  process  is  also  termed  dimension¬ 
ality  reduction.  Dimensionality  reduction  and  further 
refinement  (vide  infra)  also  helps  reduce  the  incidence  of 
prediction  by  chance  alone,  reduce  computation  time  and 
storage  requirements.  Potential  measures  of  a  spectrum’s 
useful  features  are  termed  metrics  and  are  defined  manually 
in  our  scheme. 

It  may  be  argued  that  the  metrics  are  not  selected  in  an 
objective  manner  due  to  a  human  performing  this  task  and 
some  computer  routines  must  be  employed.  While  the  use 
of  an  automated  computer  program  is  most  certainly 
objective  and  reproducible,  the  algorithm  that  drives  such 
programs  is  generated  from  spectroscopy  knowledge.  A 
well-trained  spectroscopist  can  recognize  spectral  features 
and  assign  them  to  appropriate  their  biochemical  basis. 
While  a  computer  algorithm  may  be  able  to  enhance  subtle 
features  in  the  spectrum,  automated  peak-picking  algo¬ 
rithms  run  the  risk  of  substantial  error  as  they  are  based  on 
some  very  specific  criteria  that  may  not  be  universally 
valid.  We  believe  that  computer  algorithms  are  more  suited 
to  finding  correlations  and  patterns  that  a  human  cannot  for 
the  sheer  size  and  complexity  of  data.  Hence,  the  process  of 
determining  which  spectral  features  to  consider  is  entirely 
manual  in  our  approach.  It  must  be  emphasized  that  the 
universal  set  of  metrics  is  selected  manually  but  that  the 
data  reduction  step  to  a  set  of  metrics  to  be  used  in 
algorithms  is  entirely  based  on  objective  algorithms. 
Manual  refinement  of  metrics  for  classification  is,  obvious¬ 
ly,  not  recommended  for  possibilities  of  overlooking 
specific  features,  biasing  the  selection  to  specific  feature 
sets,  or  in  determining  the  optimal  set  of  metrics  for  a 
classifier.  Dimensionality  reduction  is  also  intimately  linked 
to  the  data  quality  and  classification  algorithm  employed. 

Classification  algorithm 

A  number  of  supervised  algorithms  have  been  applied  to 
dimensionally  reduced  data,  including  those  based  on  linear 
discriminant  analysis,  neural  networks,  decision  trees,  and 
modified  Bayesian  Classifiers.  An  intermediate  step  in 
some  of  these  algorithms  provides  for  a  fuzzy  result  in 
which  every  pixel  has  a  probability  of  belonging  to  every 
class.  For  example,  in  our  approach,  each  pixel  can  have  a 
probability  (between  zero  and  one)  of  belonging  to  each 
class.  A  discriminant  function  then  assigns  each  pixel  to  a 
class  based  on  a  decision  rule.  The  pre-discriminant  data 
set,  termed  rule  imaging  set,  contains  important  informa¬ 
tion.  In  our  algorithm,  it  is  a  direct  measure  of  the 


4?)  Springer 


Anal  Bioanal  Chem  (2007)  389:1155-1169 


1165 


probability  of  the  pixel  belonging  to  the  class.  Hence,  the 
probability  value  may  be  used  to  compare  the  potential  of 
two  protocols  to  distinguish  a  cell  type  or  to  quantify 
confidence  in  results  for  tissue  classified  by  different 
methods. 

Measures  of  accuracy  and  optimization 

We  prefer  the  use  of  the  AUC  for  both  optimizing 
algorithms  and  for  validating  results.  Confidence  in  the 
value  of  the  AUC,  hence,  is  the  primary  test  for  the  valid¬ 
ity  of  developed  algorithms  and  is  characterized  by  the 
standard  error  of  the  value.  For  example,  in  validating  the 
discrimination  of  epithelial  from  stromal  pixels  in  a  blinded 
validation  set,  the  cumulative  distribution  of  AUC  in  a 
TMA  is  shown  in  Fig.  8.  More  than  20%  of  the  spots  had 
an  AUC  >95%  and  no  AUC  value  below  0.8  was  recorded. 
One  drawback  of  using  ROC  curves  and  AUC  values  is  that 
the  results  are  valid  for  one  at  a  time  classification.  Hence, 
we  have  analyzed  here  the  segmentation  of  epithelium  from 
all  other  cell  types.  The  tissue  is  classified  into  ten  classes 
as  before  but  the  results  are  lumped  into  epithelial  and  non- 
epithelial  pixels.  Further,  not  all  TMA  cores  have  all  types 
of  cells.  Hence,  the  two-class  model  also  allows  us  to 
examine  a  large  number  of  samples.  Last,  we  excluded 
cores  that  did  not  contain  at  least  100  pixels  of  each  class  to 
leave  103  cores  for  the  analysis. 

Quantitative  measures  of  performance  and  accuracy  are 
perhaps  the  weakest  portion  of  reports  using  IR  spectros¬ 
copy  for  cancer  pathology.  Typically,  sensitivity  and  spec¬ 
ificity  have  been  employed  as  summary  measures.  While 
these  are  indeed  very  relevant,  we  demonstrate  that  they  are 
insufficient  and  classification  analysis  must  utilize  more 
measures  to  understand  the  process.  Specifically,  the  use  of 


Fig.  8  Distribution  of  AUC  values  in  a  TMA  for  discriminating 
epithelium  from  stroma  using  the  ten-class  model 


receiver  operating  characteristic  (ROC)  curves  [71]  is  an 
excellent  direction.  The  area  under  the  ROC  curve  is  a 
further  summary  measure  that  provides  both  a  quantitative 
understanding  of  the  discrimination  potential  of  the  model 
and  a  convenient  measure  to  compare  multiple  classifica¬ 
tion  models.  The  third  tool  we  introduced  was  the 
confusion  matrix.  While  ROC  curves  provide  the  potential 
for  correct  classification  of  a  binary  rule  at  a  time,  con¬ 
fusion  matrices  correspond  to  a  particular  point  on  the 
ROC  curve  under  the  constraints  of  accuracy  measures  of 
other  classes.  These  also  directly  correspond  to  the  final 
segmentation  of  the  rule  image  under  an  optimization 
condition.  The  optimization  condition  may  simply  be  the 
maximization  of  the  accuracy  or  may  be  the  minimization 
of  certain  types  of  errors. 

Discriminant  and  class  assignment 

In  a  multi-class  analysis,  our  approach  to  evaluating  ROC 
curves  for  a  class  is  one  at  a  time,  i.e.,  all  other  classes  are 
essentially  lumped  in  the  rule  data  and  the  highest 
probability  of  the  lumped  ensemble  is  compared  to  the 
class  whose  ROC  curve  is  being  built.  Hence,  the  AUC 
values  must  be  regarded  as  a  potential  for  classification. 
They  are  best  suited  to  answer  the  binary  question  of 
whether  a  pixel  is  correctly  identified  or  not  when 
considering  a  single  class.  This  method  is  ideally  suited  to 
a  cascaded  classifier  one  at  a  time.  Such  a  classifier  has  not 
been  reported  yet  but  would  provide  a  means  to  explicitly 
determine  the  error  for  any  given  classification  scheme. 


Experimental  parameters  and  classification 

Here,  we  take  advantage  of  the  trading  rules  of  FTIR 
spectroscopy  and  imaging  to  model  the  effects  of  the 
experimental  parameters  on  the  classification  process. 
While  the  signal  to  noise  ratio  (SNR)  and  resolution  are 
generally  arbitrarily  fixed  in  most  studies,  we  demonstrate 
their  importance  in  classification. 

Effect  of  signal  to  noise  ratio 

There  are  two  issues:  what  is  the  “best”  SNR  to  formulate 
algorithms  and  second,  provided  an  algorithm,  what  is  the 
least  SNR  that  would  provide  adequate  classification.  Only 
the  latter  issue  is  examined  here.  As  with  conventional 
FTIR  spectrometers,  imaging  spectrometers  obey  the 
trading  rules  of  IR  spectroscopy.  Hence,  if  an  n- fold 
reduction  in  SNR  provides  the  same  results,  data  acquisi¬ 
tion  will  be  ?22-fold  faster.  Thus,  in  addition  to  an  interesting 
fundamental  behavior  of  the  classifier,  the  role  of  SNR  has 
a  direct  bearing  on  the  speed  at  which  data  is  acquired. 
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Fig.  9  a  Noise  in  the  data  set  as 
a  function  of  added  random 
noise,  b  Effect  of  spectral  noise 
on  the  accuracy  of  classification 
as  measured  by  AUC  values 
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We  examined  classification  accuracy  as  a  function  of 
average  spectral  noise.  To  strictly  examine  the  effect  of 
noise,  data  must  be  acquired  at  different  co-added  spectral 
numbers.  The  time  required  for  imaging  an  array  multiple 
times,  however,  is  prohibitive.  Hence,  we  computationally 
added  random,  Gaussian  noise  to  the  original  spectral  data. 
Peak-to-peak  and  root  mean  square  (rms)  noise  were 
measured  in  the  1,950-  to  2,150-cnT1  region  adjacent  to 
the  amide  I  peak.2  Representative  single  pixel  spectra  from 
the  data  sets  are  shown,  as  a  function  of  noise,  in  Fig.  9a. 
We  additionally  plotted  the  observed  noise  levels  against 
the  added  noise  to  verify  linearity  (plot  not  shown).  The 
linear  relationship  conforms  to  the  expected  result  and 
provides  a  scaling  factor  to  express  the  equivalent  reduction 
in  data  acquisition  time  (co-addition)  that  would  be  realized 
at  that  noise  level.  For  example,  the  addition  of  0.005  a.u. 
of  noise  raises  the  peak-to-peak  noise  from  0.0013  to 
0.015  a.u.,  corresponding  to  a  decrease  in  data  acquisition 
time  by  a  factor  of  ca.  100  for  this  data  set.  In  addition  to 
increasing  noise,  we  employed  an  algorithm  based  on  an 
MNF  transform  [72,  73]  to  mathematically  eliminate  noise. 
The  observed  peak-to-peak  noise  was  0.00017  a.u., 
corresponding  to  an  increase  in  data  acquisition  time  by  a 
factor  greater  than  ca.  100.  Hence,  the  data  examined  span 
about  5  orders  in  magnitude  of  collection  time. 

The  average  height  of  the  amide  I  peak  was  0.42  a.u.  in 
all  cases,  providing  a  SNR  of  2,500  (MNF-corrected  data) 
to  1.5  for  the  data  sets.  Accuracy  as  a  function  of  the  noise 
level  is  shown  in  Fig.  9b.  While  the  x-error  bars  indicate  the 
standard  deviation  of  noise  levels  in  pixels,  the  y-error  bars 
indicate  the  standard  deviation  in  AUC  values  of  all  ten 


2  It  is  noteworthy  that  we  are  examining  trends  in  the  absorbance 
spectra.  Strictly,  SNR  should  be  measured  in  single  beam  spectra  to 
relate  rigorously  to  theory.  It  can  be  shown,  however,  that  the  trend 
will  hold  approximately  for  the  absorbance  spectra  as  well.  Many 
practitioners  advocate  the  use  of  rms  SNR.  We  are  employing  peak-to- 
peak  fluctuations  over  the  same  spectral  range.  Hence,  the  noise 
values  we  obtain  will  be  higher  but  will  follow  the  same  trend. 


classes.  As  a  general  rule,  the  classification  improves  with 
lower  noise  levels.  We  first  note  that  the  classification  does 
not  become  perfect  for  any  noise  level  and  there  is  a 
significantly  diminishing  return  in  increasing  the  SNR 
beyond  a  level.  At  the  other  end,  the  ability  to  distinguish 
classes  is  entirely  lost  at  levels  of  ca.  0.1.  Performance 
across  multiple  data  sets  observed  using  our  prediction 
model  indicates  that  the  increases  demonstrated  at  noise 
levels  lower  than  ca.  0.003  a.u.  are  within  the  variance. 
Hence,  there  is  little  benefit  to  decreasing  the  noise  levels 
below  ca.  0.003  a.u.  for  this  data  set,  or  to  increasing  the 
SNR  beyond  ca.  150.  It  must  be  emphasized  that  the  model, 
prediction  algorithm,  and  discriminant  function  are  inti¬ 
mately  linked  in  a  non-linear  manner.  While  this  makes  it 
impossible  to  predict  the  behavior  generally  of  all  classifi¬ 
cation  approaches,  this  simple  exercise  may  be  conducted 
to  determine  the  optimal  data  acquisition  parameters.  For 
our  selected  metrics  and  model,  it  appears  that  the  data 
acquisition  time  can  be  decreased  by  a  factor  of  ca.  3 
without  significant  degradation  in  accuracy. 

Spectral  resolution 

We  next  examined  the  effect  of  spectral  resolution  on  the 
results  that  would  be  obtained  using  the  developed 
algorithm.  As  in  the  previous  section,  the  data  were  not 
re-acquired  but  were  downsampled  from  acquired  data 
using  a  neighbor  binning  procedure.  Spectra  from  the  same 
epithelial  class  pixel,  at  different  resolutions  (Fig.  10a), 
demonstrate  the  effect  of  downsampling  on  feature  defini¬ 
tion.  Figure  10b  demonstrates,  first,  that  the  peak-to-peak 
noise  levels  over  the  region  remain  the  same  with  spectral 
resolution.  As  previously  observed,  noise  is  an  important 
control  in  comparing  spectra;  the  peak-to-peak  noise  over 
the  same  number  of  data  points  was  preserved  by  neighbor 
binning.  In  practice,  the  constant-throughput  spectrometer 
would  provide  a  SNR  (or  noise  level,  in  this  case)  that 
decreases  linearly  with  resolution.  Since  most  array 
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Fig.  10  a  Spectra  obtained  by 
downsampling  acquired  data  to 
different  resolutions  using  a 
neighbor  binning  procedure. 

The  inset  demonstrates  the  effect 
of  resolution  on  narrower  fea¬ 
tures  in  the  spectrum,  b  AUC 
values  for  each  class  and  aver¬ 
age  AUC  values  as  a  function  of 
spectral  resolution  demonstrate  a 
decrease  only  for  coarse  spectral 
resolution 


Wavenumber(cnr1) 


b 


detectors  can  be  operated  with  higher  integration  times,  it  is 
fair  to  assume  that  the  time  advantage  in  decreasing 
resolution  would  be  linear.  Second,  the  performance  of  the 
classifier  is  very  nearly  the  same  for  finer  spectral  resolutions 
and  degrades  only  significantly  for  32  cm-1.  While  the 
results  may  appear  to  be  surprising,  a  closer  analysis  of  the 
basis  of  the  algorithms  provides  insight  into  the  trends. 

The  classifier  is  based  on  absorbance  and  center  of 
gravity  measures  of  the  peaks.  It  is  well  established  that 
absorbance  is  measured  accurately,  provided  that  the 
FWHH  of  the  peak  is  not  significantly  smaller  than  the 
resolution.  The  Ramsay  resolution  parameter,  a,  is  a  useful 
measure  that  was  originally  developed  for  monochromators 
but  has  been  shown  to  be  applicable  to  FTIR  spectrometers 
as  well  [74].  While  most  bands  are  broad  and  peak 
absorbance  lower  than  ca.  0.7,  absorbance  values  are  not 
expected  to  be  adversely  impacted  from  the  measurement 
process.  With  decreasing  resolution,  however,  broadening 
within  complex  peaks  shapes  may  lead  to  observed  changes 
in  the  apparent  absorption  at  a  specific  wavenumber.  The 
change  itself  may  not  have  a  significant  influence  on  the 
classifier  performance  as  it  depends  on  several  such 
metrics.  A  second  type  of  metric  calculates  the  area  under 
the  curve.  This  is  not  expected  to  be  impacted  significantly 
for  most  peaks.  The  third  type  of  metric  we  have  used  is  the 
center  of  gravity  of  a  spectral  region.  While  spectral 
analyses  ordinarily  attempt  to  locate  the  peak  position  and 
use  it  as  a  metric,  we  chose  the  center  of  gravity  for  its 
sensitivity  to  both  position  and  asymmetrical  shape  changes 
in  complex  spectral  envelopes  observed  in  biological 
samples.  Since  the  classifier  is  based  on  center  of  gravity 
of  a  feature  and  not  on  the  wavenumber  of  the  peak 
maximum,  it  is  a  very  robust  measure  that  is  relatively 
unaffected  by  spectral  resolution  or  noise. 

Generalization  of  developed  algorithms  to  instruments 
and  practical  approaches 

The  characterization  of  classification  with  regard  to 
spectrometer  performance  (SNR)  and  spectral  resolution 


provides  information  to  optimize  parameters  on  one  spec¬ 
trometer.  It  is  unclear,  however,  if  the  calibration  would 
transfer  to  another  spectrometer.  We  contend  that  the 
potential  for  a  successful  transfer  is  high  as  the  classifica¬ 
tion  process  is  relatively  insensitive  to  resolution,  implying 
that  it  would  only  be  weakly  sensitive  to  apodization  or  to 
small  inaccuracies  in  wavelength  scale.  Similarly,  if  the 
SNR  of  acquired  data  is  used  as  control,  perturbations  due 
to  fixed  pattern  noise  in  focal  plane  array  detectors  or  the 
different  use  of  electronic  filters  by  different  manufacturers 
is  likely  to  be  insignificant  in  classifying  tissue  correctly. 
Various  instrument  manufacturers  also  set  the  nominal 
optical  resolution  differently  in  their  instruments.  The  issue 
of  spatial  resolution,  of  course,  is  more  complex.  Never¬ 
theless,  any  resolution  setting  around  the  wavelength- 
limited  case  will  likely  provide  consistent  results.  To  our 
knowledge,  there  has  been  no  comparison  yet  of  classifier 
performance  across  mid-IR  FTIR  imaging  spectrometers 
using  algorithms  developed  on  one  specific  instrument.  The 
developed  protocol  provides  for  such  a  framework  and 
detailed  results  are  awaited  from  on-going  work  [75]. 

Outlook  and  prospects 

An  exciting  period  in  imaging  tissues  spectroscopically 
with  low  power,  optical  microscopy-comparable  resolution 
is  emerging.  Considerable  work,  however,  needs  to  be 
accomplished  before  this  idea  can  become  a  clinical  reality. 
An  ultimate  goal  of  such  studies  is  to  provide  a  key 
technology  for  emerging  molecular  pathology.  The  ap¬ 
proach  promises  greatly  reduced  error  rates,  automation, 
and  economic  benefits  in  current  pathology  practice.  Look¬ 
ing  to  the  future,  chemical  imaging  approaches  will  be 
employed  for  diagnosing  cancers  in  pre-malignant  stages 
prior  to  their  apparent  changes  observable  by  conventional 
means,  predicting  the  prognosis  of  the  lesion  and  intra¬ 
operative  imaging  in  real-time.  Fundamental  studies  in  drug 
discovery  and  mechanisms  of  molecular  interactions  are 
further  examples  that  would  be  enabled  by  progress  in  this 
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area.  Doubtless,  exciting  applications  lie  ahead  and  prog¬ 
ress  is  rapidly  being  made  towards  practical  applications 
but  much  work  needs  to  be  done  to  carefully  apply  this 
powerful  technology  to  multiple  aspects  of  pathology. 
Success  in  this  endeavor  promises  to  change  the  practice 
of  pathology  radically  and  alter  the  clinical  management  of 
cancer  patients. 
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Abstract  Prostate  cancer  accounts  for  one-third  of  noncutaneous  cancers  diagnosed  in  US 
men  and  is  a  leading  cause  of  cancer-related  death.  Advances  in  Fourier  transform  infrared 
spectroscopic  imaging  now  provide  very  large  data  sets  describing  both  the  structural  and 
local  chemical  properties  of  cells  within  prostate  tissue.  Uniting  spectroscopic  imaging  data 
and  computer-aided  diagnoses  (CADx),  our  long  term  goal  is  to  provide  a  new  approach  to 
pathology  by  automating  the  recognition  of  cancer  in  complex  tissue.  The  first  step  toward  the 
creation  of  such  CADx  tools  requires  mechanisms  for  automatically  learning  to  classify  tissue 
types — a  key  step  on  the  diagnosis  process.  Here  we  demonstrate  that  genetics-based  machine 
learning  (GBML)  can  be  used  to  approach  such  a  problem.  However,  to  efficiently  analyze 
this  problem  there  is  a  need  to  develop  efficient  and  scalable  GBML  implementations  that  are 
able  to  process  very  large  data  sets.  In  this  paper,  we  propose  and  validate  an  efficient  GBML 
technique — NAX — based  on  an  incremental  genetics-based  rule  learner.  NAX  exploits  mas¬ 
sive  parallelisms  via  the  message  passing  interface  (MPI)  and  efficient  rule-matching  using 
hardware-implemented  operations.  Results  demonstrate  that  NAX  is  capable  of  performing 
prostate  tissue  classification  efficiently,  making  a  compelling  case  for  using  GBML 
implementations  as  efficient  and  powerful  tools  for  biomedical  image  processing. 


X.  Llora  (ISI) 

National  Center  for  Supercomputing  Applications,  University  of  Illinois  at  Urbana-Champaign, 

1205  W.  Clark  Street,  Urbana,  IL  61801,  USA 
e-mail:  xllora@uiuc.edu 

A.  Priya  •  R.  Bhargava 

Department  of  Bioengineering,  University  of  Illinois  at  Urbana-Champaign,  1304  W.  Springfield  Ave., 
Urbana,  IL  61801,  USA 

A.  Priya 

e-mail:  priya@uiuc.edu 

R.  Bhargava 
e-mail:  rxb@uiuc.edu 

R.  Bhargava 

Beckman  Institute  for  Advanced  Science  and  Technology,  University  of  Illinois  at  Urbana-Champaign, 
405  N.  Mathews  Ave.,  Urbana,  IL  61801,  USA 


Springer 


X.  Llora  et  al. 


Keywords  Observer-invariant  histopathology  •  Genetics-based  machine  learning  • 
Learning  Classifier  Systems  •  Hardware  acceleration  •  Vector  instruction  • 

SSE2  •  MPI  •  Massive  parallelism 


1  Introduction 

Pathologist  opinion  of  structures  in  stained  tissue  is  the  definitive  diagnosis  for  almost  all 
cancers  and  provides  critical  input  for  therapy.  In  particular,  prostate  cancer  accounts  for 
one-third  of  noncutaneous  cancers  diagnosed  in  US  men.  Hence,  it  is,  appropriately,  the 
subject  of  heightened  public  awareness  and  widespread  screening.  If  prostate-specific 
antigen  (PSA)  or  digital  rectal  screens  are  abnormal,  a  biopsy  is  needed  to  definitively 
detect  or  rule  out  cancer.  Pathologic  status  of  biopsied  tissue  not  only  forms  the  definitive 
diagnosis  but  constitutes  an  important  cornerstone  of  therapy  and  prognosis.  There  is, 
however,  a  need  to  add  useful  information  to  diagnoses  and  to  introduce  new  technologies 
that  allow  economical  cancer  detection  to  focus  limited  healthcare  resources.  In  pathology 
practice,  widespread  screening  results  in  a  large  workload  of  biopsied  men,  in  turn,  placing 
a  increasing  demand  on  services.  Operator  fatigue  is  well  documented  and  guidelines  limit 
the  workload  and  rate  of  examination  of  samples  by  a  single  operator.  Importantly,  newly 
detected  cancers  are  increasingly  moderate  grade  tumors  in  which  pathologist  opinion 
variation  complicates  decision-making. 

For  the  reasons  above,  there  is  an  urgent  need  for  automated  and  objective  pathology 
tools.  We  have  sought  to  address  these  requirements  through  novel  Fourier  transform 
infrared  (FTIR)  spectroscopy-based,  computer-aided  diagnoses  for  prostate  cancer  and 
develop  the  required  microscopy  and  software  tools  to  enable  its  application.  FTIR 
spectroscopic  imaging  is  a  new  technique  that  combines  the  spatial  specificity  of  optical 
microscopy  and  the  biochemical  content  of  spectroscopy.  As  opposed  to  thermal  infrared 
imaging,  FTIR  imaging  measures  the  absorption  properties  of  tissue  through  a  spectrum 
consisting  of  (typically)  1024-2048  wavelength  elements  per  pixel.  Since  IR  spectra  reflect 
the  molecular  composition  of  the  tissue,  image  contrast  arises  from  differences  in 
endogenous  chemical  species.  As  opposed  to  visible  microscopy  of  stained  tissue  that 
requires  a  human  eye  to  detect  changes,  numerical  computation  is  required  to  extract 
information  from  IR  spectra  of  unstained  tissue.  Extracted  information,  based  on  a  com¬ 
puter  algorithm,  is  inherently  objective  and  automated  (Lattouf  and  Saad  2002;  Fernandez 
et  al.  2005;  Levin  and  Bhargava  2005;  Bhargava  et  al.  2006). 

Uniting  spectroscopic  imaging  data  and  computer-aided  diagnoses  (CADx),  we  seek  to 
provide  a  new  approach  to  pathology  by  automating  the  recognition  of  cancer  in  complex 
tissue.  This  is  an  exciting  paradigm  in  which  disease  diagnoses  are  objective  and  repro¬ 
ducible;  yet  do  not  require  any  specialized  reagents  or  human  intervention.  The  first  step 
toward  the  creation  of  such  CADx  tools  requires  mechanisms  for  reliable  and  automated 
tissue  type  classification.  In  this  paper  we  demonstrate  how  genetics-based  machine 
learning  tools  can  achieve  such  a  goal.  Interpretability  of  the  learned  models  and  efficient 
processing  of  very  large  data  sets  have  lead  us  to  rule-based  models — easy  to  interpret — 
and  genetics-based  machine  learning — inherent  massively  parallel  methods  with  the 
required  scalability  properties  to  address  very  large  data  sets.  We  present  the  method  and 
the  efficiency  enhancement  techniques  proposed  to  address  automated  tissues  classifica¬ 
tion.  When  pushed  beyond  the  relatively  small  problems  traditionally  used  to  test  such 
methods,  an  need  for  efficient  and  scalable  implementations  becomes  a  key  research  topic 
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that  needs  to  be  addressed.  We  designed  the  proposed  a  technique  with  such  constraints  in 
mind.  A  modified  version  of  an  incremental  genetics-based  rule  learner  that  exploits 
massive  parallelisms — via  the  message  passing  interface  (MPI) — and  efficient  rule¬ 
matching  using  hardware-oriented  operations.  We  name  this  system  NAX.  NAX  is  compared 
to  traditional  and  genetics-based  machine  learning  techniques  on  an  array  of  publicly 
available  data  sets.  We  also  report  the  initial  results  achieved  using  the  proposed  technique 
when  classifying  prostate  tissue. 

The  remainder  of  the  paper  is  structured  as  follows.  We  present  an  overview  of  the 
problem  addressed  in  Sect.  2,  paying  special  attention  to  tissue  classification.  We  discuss  in 
Sect.  3  the  hurdles  that  traditional  genetics-based  machine  learning  implementations  face 
when  applied  to  very  large  data  sets.  Section  4  presents  our  solution  to  those  hurdles.  We 
also  describe  the  incremental  rule  learner  proposed  for  tissue  classification.  Last,  we 
summarize  results  on  publicly-available  data  sets  and  the  preliminary  results  for  tissue 
classification  on  a  prostate  tissue  microarray  in  Sect.  5.  Finally,  in  Sect.  6,  we  present 
conclusions  and  further  work. 


2  Biomedical  imaging  and  data  mining 

This  section  presents  an  overview  of  the  problem  addressed  in  this  paper.  We  first  intro¬ 
duce  infrared  spectroscopic  imaging  as  a  potentially  powerful  tool  for  cancer  diagnosis  and 
prognosis.  Then,  we  explore  the  protocols  that  provide  raw  high-quality  data  that  for  data 
mining.  Finally,  we  conclude  by  focusing  on  the  key  task,  tissue  classification,  by  focusing 
on  prostate  tissue. 


2.1  Infrared  spectroscopy  and  imaging  for  cancer  diagnosis  and  prognosis 

Infrared  spectroscopy  is  a  well-established  molecular  technique  and  is  widely  used  in 
chemical  analyses.  The  fundamental  principle  governing  the  response  of  any  material  is 
that  the  vibrational  modes  of  molecules  are  resonant  in  energy  with  photons  in  the  mid- 
infrared  region  (2-14  mm)  of  the  electromagnetic  spectrum.  Hence,  when  photons  of 
energy  that  are  resonant  with  the  material’s  molecular  composition  are  incident,  a  number 
are  absorbed.  The  number  absorbed  is  directly  proportion  to  the  number  of  chemical 
species  that  are  excited.  Hence,  any  material  has  a  characteristic  frequency- dependent 
absorption  profile  called  a  spectrum.  An  infrared  spectrum  is  often  termed  the  “optical 
fingerprint”  of  a  material  as  it  can  help  uniquely  identify  molecular  composition — see 
Fig.  1. 

Researchers,  including  us,  have  contributed  to  develop  an  imaging  version  of  spec¬ 
troscopy  that  is  essentially  similar  to  an  optical  microscope.  In  this  mode  of  spectroscopy, 
images  are  acquired  in  the  manner  of  optical  microscopy  with  one  important  difference. 
Instead  of  measuring  the  intensity  of  three  colors  for  a  visible  image,  several  thousand 
intensity  values  are  acquired  at  each  pixel  in  the  image  as  a  function  of  wavelength 
(spectrum  at  each  pixel).  The  resulting  data  set  is  three  dimensional  (2  spatial  and  1 
spectral  indices)  consisting  typically  of  a  size  256  x  256  x  1024,  but  extending  to  sizes 
such  as  3500  x  3500  x  2048.  Since  each  data  point  is  stored  as  a  16-bit  number,  the 
data  size  typically  runs  into  several  tens  to  hundreds  of  gigabytes. 
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Fig.  1  Conventional  staining  and  automated  recognition  by  chemical  imaging.  (A)  Typical  H&E  stained 
sample,  in  which  structures  are  deduced  from  experience  by  a  human.  Highlights  of  specific  regions  in  the 
manner  of  H&E  is  possible  using  FTIR  imaging  without  stains.  (B)  Absorption  at  1080  cm-1  commonly 
attributed  to  nucleic  acids  and  (C)  to  proteins  of  the  stroma.  The  data  obtained  is  3  dimensional  (D)  from 
which  spectra  (E)  or  images  at  specific  spectral  features  may  be  plotted 


2.2  Mining  the  spectra:  Two  sequential  problems 

Though  the  continued  development  of  fast  FTIR  microspectroscopy  represents  an  exciting 
opportunity  for  pathology,  handling  the  resultant  data  and  rapidly  providing  classifications 
remains  a  critical  challenge.  First,  the  sheer  volume  of  data — potentially  larger  than  10  GB 
a  day — represents  an  organizational  and  retrieval  challenge.  Next,  extraction  of  useful 
information  in  short  time  periods  requires  the  formulation  of  optimal  protocols.  Third,  the 
automated  cancer  segmentation  problem  is  very  complex  and  offers  a  number  of  routes  and 
levels  of  data  that  need  to  be  analyzed  to  determine  the  optimal  approach  for  application  in 
a  laboratory. 

The  typical  application  is  the  need  to  extract  information  from  the  data  set  such  that  it  is 
clinically  relevant.  Hence,  the  output  of  the  data  mining  algorithm  to  be  developed  is  well- 
bounded  and  clearly  defined.  For  example,  in  the  prostate  there  are  two  levels  of  interest.  In 
the  first  level,  the  pathologist  examines  the  tissue  to  determine  if  there  are  any  epithelial 
cells.  Since  more  than  95%  of  prostate  cancers  arise  in  epithelial  cells,  transformations  in 
this  class  of  cells  forms  the  diagnostic  basis  and  a  primary  determinant  of  therapy.  Other 
cell  types  of  interest  are  lymphocytes  that  may  indicate  inflammation,  blood  vessel  density 
that  may  indicate  the  development  of  new  blood  supply  indicative  of  cancer  growth  and 
nerves  that  may  be  invaded  by  cancer  cells.  Hence,  any  automated  approach  to  pathology 
must  first  identify  cell  types  accurately.  The  second  step  in  pathology  follows.  Once 


Springer 


Histopathology  using  genetics-based  machine  learning 


epithelial  cells  are  located,  their  spatial  patterns  are  indicative  of  disease  states.  In  our 
imaging  approach,  we  can  identify  both  spatial  patterns  as  well  as  chemical  patterns  in 
epithelial  cells.  Hence,  the  task  would  be  to  use  either  or  both  to  classify  disease.  In  this 
paper,  we  focus  only  on  the  accurate  identification/classification  of  tissue  types  as  the  first 
step  of  the  path  that  leads  to  obtaining  the  correct  pixels  of  epithelium. 


2.3  Tissue  classification  for  prostate  arrays 

Prostate  tissue  is  structurally  complex,  consisting  primarily  of  glandular  ducts  lined  by 
epithelial  cells  and  supported  by  heterogeneous  stroma.  This  tissue  also  contains  blood 
vessels,  blood,  nerves,  ganglion  cells,  lymphocytes  and  stones  (which  are  comprised  of 
luminal  secretions  of  cellular  debris)  that  organize  into  structure  measuring  from  tens  to 
hundreds  of  microns.  These  structures  are  readily  observable  within  stained  tissue  using 
bright-held  microscopy  at  low  to  medium  magnifications.  Hence,  in  applying  FTIR 
imaging  (Levin  and  Bhargava  2005),  we  obtain  the  common  structural  detail  employed 
clinically  and,  additionally,  spectral  information  indicative  of  tissue  biochemistry.  As 
histologic  classes  contain  identical  chemical  components,  infrared  vibrational  spectra  are 
similar  but  reveal  small  differences  in  specific  absorbance  features.  The  technique  pro¬ 
posed  by  Fernandez  et  al.  (2005)  examines  each  cell  types’  spectra  and  transforms  each 
spectrum  into  a  vector  of  describing  features — usually  around  the  hundreds.  A  complete 
description  of  this  process  is  beyond  the  scope  of  this  paper  and  can  be  found  elsewhere 
(Fernandez  et  al.  2005).  Each  pixel  (cell  present  in  the  slice  of  micro  array  under  analysis) 
has  an  assigned  spatial  position  in  the  array  while  the  tissue  type  is  assigned  by  a  highly 
experienced  pathologist.  Thus,  the  tissue  classification  can  be  cast  into  a  supervised 
classification  problem  (Mitchell  1997),  where  all  the  attributes  are  real-valued  and  the  class 
is  the  tissue  type — ten  classes:  ephithelium,  fibrous  stroma ,  mixed  stroma ,  muscle ,  stone , 
lymphocytes ,  endothelium ,  nerve ,  ganglion ,  and  blood.  Figure  2  presents  tissue  types  that 
can  be  assigned  by  examining  a  stained  image  obtained,  after  the  FTIR  microsprectroscopy 
on  unstained  tissue, by  the  pathologist.  Each  marked  pixel  in  the  image  becomes  a  train¬ 
ing  example;  hence,  the  usual  smallest  data  set  is  around  hundreds  of  thousand  records 
per  array. 


3  Larger,  bigger,  and  faster  genetics-based  machine  learning 

Bernado  et  al.  (2001)  presented  a  first  empirical  comparison  between  genetics-based 
machine  learning  techniques  (GBML)  and  traditional  machine  learning  approached.  The 
authors  reported  that  GBML  techniques  were  as  competent  as  traditional  techniques.  Later, 
Bacardit  and  Butz  (2006)  repeated  the  analysis,  obtaining  similar  results.  Most  of  the 
experiments  presented  on  both  papers  used  publicly  available  data  sets  provided  by  the 
University  of  California  at  Irvine  repository  (Merz  and  Murphy  1998).  Most  of  the  data 
sets  are  defined  over  tens  of  features  and  up  to  few  thousands  of  records — in  the  larger 
cases.  However,  a  key  property  of  GBML  approaches  is  its  intrinsic  massive  parallelism 
and  scalability  properties.  Cantu-Paz  (2000)  presented  how  efficient  and  accurate  genetics 
algorithms  could  be  assembled,  and  Llora  (2002)  presented  how  such  algorithms  can  be 
efficiently  used  for  machine  learning  and  data  mining.  However,  there  are  elements  that 
need  to  be  revisited  when  we  want  to  efficiently  apply  GBML  techniques  to  large  data  sets 
such  as  the  one  described  in  the  previous  section. 
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Fig.  2  The  figure  presents  the  tissue  labeling  provided  by  a  pathologist  biopsy  section  of  human  prostate 
tissue.  Each  spot  represents  the  section  of  a  needle.  Different  colors  represent  different  tissue  types 


The  GBML  techniques  require  evaluating  candidate  solutions  against  the  original  data 
set  matching  the  candidate  solutions  (e.g.,  rules,  decision  trees,  prototypes)  against  all 
the  instances  in  the  data  set.  Regardless  of  the  flavor  used,  Llora  and  Sastry  (2006) 
showed  that,  as  the  problem  grows,  rule  matching  governs  the  execution  time.  For  small 
data  sets  (teens  of  attributes  and  few  thousands  of  records)  the  matching  process  takes 
more  than  85%  of  the  overall  execution  time  marginalizing  the  contribution  of  the  other 
genetic  operators.  This  number  increases  to  98%  and  above,  when  we  move  to  data  sets 
with  few  hundreds  of  attributes  and  few  hundred  thousands  of  records.  More  than  98% 
of  the  time  is  spent  evaluating  candidate  solutions.  Each  evaluation  can  be  computed  in 
parallel.  Moreover,  the  evaluation  process  may  also  be  parallelized  on  very  large  data 
sets  by  splitting  and  distributing  the  data  across  the  computational  resources.  A  detailed 
description  of  the  parallelization  alternatives  of  GBML  techniques  can  be  found  else¬ 
where  (Llora  2002). 

Currently  available  off-the-shelf  GBML  methods  and  software  distributions  (Barry 
and  Drugow-itsch  1997;  Llora  2006)  do  not  usually  target  large  data  sets.  The  two  main 
bottlenecks  are  large  memory  footprints  and  sequential-processing  oriented  processes. 
Generally  speaking,  they  were  designed  to  run  on  single  processor  machines  with 
enough  memory  to  fit  the  entire  data  set.  Hence,  designers  did  not  paying  much 
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attention  to  the  memory  footprint  required  to  store  the  data  set — usually  completely 
loaded  into  memory  and  the  population  of  candidate  solutions.  These  large  complex 
structures  were  geared  to  facilitate  the  programming  effort,  but  they  are  not  designed 
toward  the  efficient  evaluation  of  the  candidate  solutions.  However,  efforts  have  been 
made  to  push  GBML  methods  into  domains  which  require  processing  large  data  sets. 
Three  different  works  need  to  be  mentioned  here.  Flockhart  (1995)  proposed  and 
implemented  GA-MINER,  one  of  the  earliest  effort  to  create  data  mining  systems  based 
on  GBML  systems  that  scale  across  symmetric  multi-processors  and  massively  parallel 
multi-processors.  Flockhart  (1995)  reviewed  different  encoding  and  parallelization 
schemes  and  conducted  proper  scalability  studies.  Llora  (2002)  explored  how  fine¬ 
grained  parallel  genetic  algorithms  could  become  efficient  models  for  data  mining. 
Theoretical  analysis  of  performance  and  scalability  were  developed  and  validated  with 
proper  simulations.  Recently,  Llora  and  Sastry  (2006)  explored  how  current  hardware 
can  efficiently  speed  up  rule  matching  against  large  data  sets.  These  three  approaches 
are  the  basis  of  the  incremental  rule  learning  proposed  in  the  next  section  to  approach 
very  large  data  sets. 

Another  important  issue  in  real-world  problems  is  the  class  distribution.  Usually 
most  real  problems  have  a  clear  class  imbalance.  Recently,  Orriols-Puig  and  Bernado- 
Mansilla  (2006)  have  revisited  this  issue,  showing  how  GBML  techniques  successfully 
learn  and  maintain  proper  descriptions  for  those  minority  classes.  If  not  designed 
properly,  descriptions  of  majority  classes  will  tend  to  govern  the  learned  models, 
starving  the  description  of  minority  classes.  Prostate  tissue  classification  is  a  clear 
example  of  extreme  class  imbalance.  Figure  3  presents  the  tissue  type  class  distribution. 
The  smaller  tissue  type  has  64  records,  where  as  the  larger  classes  have  several  tens  of 
thousands  records,  hence,  the  developed  approaches  must  account  for  class  size 
variation. 


Tissue  type  index  after  count  sorting 

Fig.  3  Figure  shows  the  tissue  class  distribution.  Once  the  classes  are  reordered  according  to  their 
frequency  in  the  data  set,  we  can  easily  appreciate  the  extreme  imbalance — the  smaller  tissue  type  has  64 
records,  where  as  the  larger  classes  have  several  tens  of  thousands  records 
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4  The  road  to  tractability 

We  describe  in  this  section  the  steps  we  took  to  design  a  GBML  method  (NAX)  able  to  deal 
with  very  large  data  sets  with  class  imbalance.  NAX  evolves,  one  at  a  time,  maximally 
general  and  maximally  accurate  rules.  Then,  the  covered  instance  are  removed  and  another 
maximally  general  and  maximally  general  rule  is  evolved  and  added  to  the  previously 
stored  one  forming  a  decision  list.  This  process  continues  until  no  uncovered  instances  are 
left — this  process  is  also  referred  as  the  sequential  covering  procedure  (Cordon  et  al. 
2001).  Llora  et  al.  (2005)  showed  that  maximally  general  and  maximally  accurate  rules 
(Wilson  1995)  could  also  be  evolved  using  Pittsburgh- style  Learning  Classifier  Systems. 
Later,  Llora  et  al.  (2007)  showed  that  competent  genetic  algorithms  (Goldberg  2002) 
evolve  such  rules  quickly,  reliably,  and  accurately.  The  rest  of  this  section  describes  (1) 
efficient  implementation  techniques  to  deal  with  very  large  data  sets,  (2)  the  impact  of  class 
imbalance,  and  (3)  the  NAX  algorithm  proposed. 


4.1  Efficient  implementations 

As  introduced  earlier,  when  dealing  with  very  large  data  sets,  and  regardless  of  the  flavor 
of  the  GBML  technique  used,  we  may  spend  up  to  98%  of  the  computational  cycles  trying 
to  match  rules  to  the  original  data  set  (Llora  and  Sastry  2006).  Each  solution  evaluation  is 
independent  of  each  other  and,  hence,  it  can  be  computed  in  parallel.  Moreover,  even  the 
matching  nature  of  a  rule — the  representation  we  will  use  from  now  on — is  highly  parallel, 
since  conditions  require  performing  simultaneous  checks  against  different  attributes  per 
record.  Thus,  efficient  implementation  can  take  advantage  of  parallelizing  both  elements. 


4.1.1  Exploiting  the  hardware  acceleration 

Recently,  multimedia  and  scientific  applications  have  pushed  CPU  manufactures  to  include 
support  for  vector  instructions  again  in  their  processors.  Both  applications  areas  require 
heavy  calculations  based  on  vector  arithmetic.  Simple  vector  operations  such  as  add  or 
product  are  repeated  over  and  over.  During  1980s  and  1990s  supercomputers,  such  as  Cray 
machines,  were  able  to  issue  hardware  instructions  that  enabled  basic  vector  arithmetics.  A 
more  constrained  scheme,  however,  has  made  its  way  into  general-purpose  processors 
thanks  to  the  push  of  multimedia  and  scientific  applications.  Main  chip  manufactures — 
IBM,  Intel,  and  AMD — have  introduced  vector  instruction  sets — Altivec,  SSE3,  and 
3DNow+ — that  allow  vector  operations  over  packs  of  128  bits  by  hardware.  We  will  focus 
on  a  subset  of  instructions  that  are  able  to  deal  with  floating  point  vectors.  This  subset  of 
instructions  manipulate  groups  of  four  floating-point  numbers.  These  instructions  are  the 
basis  of  the  fast  rule  matching  mechanism  proposed. 

Our  goal  is  to  evolve  a  set  of  rules  that  correctly  classifies  the  current  data  set  rom 
prostate  tissue.  Using  a  knowledge  representation  based  on  rules  allows  us  to  inspect  the 
learned  model,  gaining  insight  into  the  biological  problem  as  well.  All  the  attributes  of  the 
domain  are  real-value  and  the  conditions  of  the  rules  need  to  be  able  to  express  conditions 
in  a  spaces.  We  use  a  similar  rule  encoding  to  the  one  proposed  by  Wilson  (2000b) — a 
variation  of  the  original  work  proposed  by  Wilson  (2000a)  and  later  reviewed  by  Stone  and 
Bull  (2003) — and  widely  used  in  the  GBML  community.  Rules  express  the  conjunction  of 
tests  across  attributes.  Each  test  may  be  defined  in  multiple  flavors  but,  without  loss  of 
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generality,  we  picked  a  simple  interval  based  one.  A  simple  example  of  an  if-then  rule, 
could  be  expressed  as  follows: 

1.0  <  a0  <  2.3  A  •  •  •  A  10.0  <  an  <  23  ->  a  (1) 

Where  the  condition  is  the  conjunction  of  the  different  attribute  tests  and  the  outcome  is  the 
predicted  class — a  tissue  type.  We  also  allow  a  special  condition — don't  care  — which 
just  always  returns  true  ,  allowing  condition  generalization.  The  rule  below  illustrates  an 
example  of  a  generalized  rule. 

1.0<flo<2.3A-3.0<fl3<2  — >  C\  (2) 


All  attributes  except  a0  and  a3  were  marked  as  don't  care. 

Each  condition  can  be  encoded  using  2  floating-point  numbers  per  condition,  where  a, 
contains  the  lower  bound  of  the  condition  and  co,  its  upper  bound.  Thus,  the  condition  (xt  < 
a0  <  co i  just  requires  to  store  the  two  floating-point  numbers.  For  efficiency  reasons  we 
store  them  in  two  separate  vectors,  on  containing  the  lower  bounds  and  the  other  con¬ 
taining  the  upper  bounds.  The  position  in  a  vector  indicates  the  attribute  being  tested.  The 
don't  care  condition  is  simply  encoded  as  a*  >  ajj  and,  hence,  we  do  not  need  to  store  any 
extra  information. 

Matching  a  rule  requires  performing  the  individual  condition  tests  before  the  final  and 
operation  can  be  computed.  Vector  instruction  sets  improve  the  performance  of  this  pro¬ 
cess  by  performing  four  operations  at  once.  Actually,  this  process  may  be  regarded  as  four 
parallel  running  pipelines.  The  process  can  be  further  improved  by  stopping  the  matching 
process  when  one  test  fails — since  that  will  turn  the  condition  into  false. 

Figure  4  presents  a  C  implementation  the  proposed  hardware- supported  rule  matching. 
The  code  assumes  that  the  two  vectors  containing  the  upper  and  lower  bounds  are  provided 
and  records  are  stored  in  a  two  dimensional  matrix.  Figure  5  presents  the  vectorized 
implementation  of  the  code  presented  in  Fig.  4  using  SSE2  instructions.  Exploiting  the 
hardware  available  can  speed  between  3  and  3.5  times  the  matching  process,  as  also  shown 
elsewhere  (Flora  and  Sastry  2006). 


4.1.2  Massive  parallelism 

Since  most  of  the  time  is  spent  on  the  evaluation  of  candidate  rules  when  dealing  with  large 
data  sets,  our  next  goal  was  to  find  a  parallelization  model  that  could  take  advantage  of  this 
peculiarity.  Due  the  quasi  embarrassing  parallel  (Grama  et  al.  2003)  nature  of  the  candi¬ 
date  rule  evaluation,  we  designed  a  coarse-grain  parallel  model  for  distributing  the 
evaluation  load.  Cantu-Paz  (2000)  proposed  several  schemes,  showing  the  importance  of 
the  trade-off  between  computation  time  and  time  spent  communicating.  When  designing 
the  parallel  model,  we  focused  on  minimizing  the  communication  cost.  Usually,  a  feasible 
solution  could  be  a  master/slave  one — the  computation  time  is  much  larger  than  the 
communication  time.  However,  GBMF  approaches  tend  to  use  rather  large  populations, 
forcing  us  to  send  rule  sets  to  the  evaluation  slaves  and  collect  the  resulting  fitness.  These 
schemes  also  increment  the  sequential  sections  that  cannot  be  parallelized,  threatening  the 
overall  speedup  of  the  parallel  implementation  as  a  result  of  Ambdhals  law  (Amdahl  1967). 

To  minimize  such  communication  cost,  each  processor  runs  an  identical  NAX  algorithm. 
They  are  all  seeded  in  the  same  manner,  hence,  performing  the  same  genetic  operations 
and  only  differing  in  the  portion  of  the  population  being  evaluated.  Thus,  the  population  is 
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1.  void  match_seq_rule_set  (  RuleSet  *  rs,  InstanceSet  is,  int  iDim,  int  iRows  ) 

2.  int  i, j ,k,iCnt , iClsIdx, iGround, iPred; 

3.  register  int  iMatcheable; 

4.  Instance  ins; 


5. 

6. 

7. 

8. 

9. 

10. 

11. 

12. 

13. 

14. 

15. 

16. 

17. 

18. 

19. 

20. 
21. 
22. 

23. 

24. 

25. 

26. 
27.  > 


iClsIdx  =  rs->iCorrectedDim; 
clean_f itness_rules_set (rs) ; 
for  (  i=0  ;  i<iRows  ;  i++  )  { 
ins  =  is  [i] ; 
iPred=-l ; 

for  (  j=0  ;  iPred==-l  &&  j<rs->iLen  ;  j++  )  { 
iMatcheable  =  1; 

for  (  iCnt=0,k=j*(rs->iCorrectedDim+VBSIF)  ; 

iMatcheable  &&  k<j*(rs->iCorrectedDim+VBSIF)+rs->iDim  ; 
k++,iCnt++  )  { 

iMatcheable  =  iMatcheable  && 

! (  (rs->pf LB  [k] <=rs->pf UB [k] )  && 

(  ins  [iCnt] <rs->pf LB  [k]  I  I  ins  [iCnt] >rs->pf UB [k] ) ) ; 

> 

if  (  iMatcheable  ) 

iPred  =  rs->pfLB [j*(rs->iCorrectedDim+VBSIF)+rs->iCorrectedDim] ; 

> 

iPred  =  (iPred==-l) ?rs->iClasses : iPred; 
iGround= (int) ins  [iClsIdx] ; 
rs->pConfMat [iGround] [iPred] ++; 


{ 


Fig.  4  This  figure  presents  a  sequential  implementation  of  the  rule  matched  process  in  C  .  A  rule  set  is 
match  against  a  data  set.  Lines  16,  17,  and  18  implement  the  condition  test  for  one  attribute.  The 
implementation  also  computes  the  confusion  matrix  that  contains  the  ground  truth  versus  predicted  class 


treated  as  collection  of  chunks  where  each  processor  evaluates  its  own  assigned  chunk, 
sharing  the  fitness  of  the  individuals  in  its  chunk  with  the  rest  of  the  processors.  Fitness  can 
be  encapsulated  and  broadcasted  maximizing  the  occupation  of  the  underlying  packing 
frames  used  by  the  network  infrastructure.  Moreover,  this  approach  also  removes  the  need 
for  sending  the  actual  rules  back  and  forth  between  processors — as  a  master/slave  approach 
would  require — thus,  minimizing  the  communication  to  the  bare  minimum — the  fitness. 
Figure  6  presents  a  conceptual  scheme  of  the  parallel  architecture  of  NAX. 

To  implement  the  model  presented  in  Fig.  6,  we  used  C  and  a  message  passing  interface 
(MPI) — we  used  the  OpenMPI  implementation  (Gabriel  et  al.  2004).  Figure  7  shows  the 
code  in  charge  of  the  parallel  evaluation.  Each  processor  computes  which  individuals  are 
assigned  to  it.  Then  it  computes  the  fitness  and,  finally,  it  just  broadcast  the  computed 
fitness.  The  rest  of  the  process  is  left  untouched,  and  besides  the  cooperative  evaluation,  all 
the  processors  end  generating  the  same  evolutionary  trace. 


4.2  Rule  sets  as  individuals 

One  main  characteristic  of  the  so-called  Pittsburgh- style  learning  classifier  systems — a 
particular  type  of  GBML — is  that  individuals  encode  a  rule  set  (Goldberg  1989;  Llora  and 
Garrell  2001;  Goldberg  2002).  Thus,  evolutionary  mechanisms  directly  recombine  one  rule 
set  against  another  one.  For  classification  tasks  of  moderate  complexity,  the  rule  sets  are 
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1. 

#def ine  VEC_MATCH (vecFLB , f LB , vecFUB , f UB , vecINS , f IN , 

vecTmp , vecOne , vecRes) 

2. 

vecFLB  =  _mm_load_ps (fLB) ; \ 

3. 

vecFUB  =  _mm_load_ps (fUB) ; \ 

4. 

vecINS  =  _mm_load_ps (f IN) ; \ 

5. 

\ 

6. 

vecRes  =  ( _ ml28i )_mm_cmpgt_ps( vecFUB, vecFLB) ;\ 

7. 

vecTmp  =  _mm_or_sil28(\ 

8. 

( _ ml28i) _mm_cmpgt_ps (vecFLB , vecINS) , \ 

9. 

( _ ml28i) _mm_cmpgt_ps (vecINS , ve 

cFUB) \ 

10. 

);\ 

11. 

vecRes  =  _mm_andnot_sil28 (_mm_and_sil28 (vecRes, 

vecTmp) , vecOne) ;\ 

12. 

> 

13. 

14. 

void  match_rule_set  (  RuleSet  *  rs,  InstanceSet  is, 

int  iDim,  int  iRows 

15. 

int  i , j , k , iCnt , iClsIdx , iGround , iPred ; 

16. 

register  int  iMatcheable; 

17. 

Instance  ins; 

18. 

19. 

_ ml28i  vecRes , vecTmp, vecOne ; 

20. 

_ ml28  vecFLB, vecFUB, vecINS; 

21. 

22. 

vecOne  =  ( _ ml28i) {-1 , -1} ; 

23. 

24. 

iClsIdx  =  rs->iCorrectedDim; 

25. 

clean_f itness_rules_set (rs) ; 

26. 

for  (  i=0  ;  iciRows  ;  i++  )  { 

27. 

//  Classify  the  instance 

28. 

ins  =  is  [i] ; 

29. 

iPred=-l ; 

30. 

for  (  j=0  ;  iPred==-l  &&  j<rs->iLen  ;  j++  ) 

{ 

31. 

iMatcheable  =  1; 

32. 

33. 

34. 

35. 

36. 

37. 

38. 

39. 

40. 

41. 

42. 

43. 

44. 

45. 

46. 


for  (  iCnt=0,k=j* (rs->iCorrectedDim+VBSIF)  ; 

iMatcheable  &&  k<j*(rs->iCorrectedDim+VBSIF)+rs->iDim  ; 
k+=VBSIF , iCnt+=VBSIF  )  { 

VEC_MATCH ( vecFLB , & (rs->pf LB [k] ) , 
vecFUB , & (rs->pf UB [k] ) , 

vecINS, & (ins [iCnt] ) , vecTmp, vecOne , vecRes) ; 
iMatcheable  =  0xFFFF==_mm_movemask_epi8 (vecRes) ; 

} 

if  (  iMatcheable  ) 

iPred  =  rs->pfLB [j* (rs->iCorrectedDim+VBSIF)+rs->iCorrectedDim] ; 
iPred  =  (iPred==-l) ?rs->iClasses : iPred; 
iGround= (int) ins [iClsIdx] ; 
rs->pConf Mat [iGround] [iPred] ++ ; 


Fig.  5  This  figure  presents  a  vectorized  implementation  of  the  rule  matching  process  presented  in  Fig.  4. 
Lines  1-12  implement  the  parallelized  test  against  four  attributes  using  vector  instructions.  The  code  is 
written  using  C  intrinsics  for  SSE2  compatible  architectures.  This  code  runs  on  P4  or  newer  Intel  processors 
and  Opteron  or  Athlon  64  AMD  processors 


not  large.  However,  for  complex  problems,  the  potential  number  of  required  rules  to  ensure 
proper  classification  may  need  large  amounts  of  memory  that  become  prohibitive.  The 
requirements  increase  even  further  in  the  presence  of  noise  (Llora  and  Goldberg  2003). 
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Fig.  6  This  figure  illustrates  the  parallel  model  implemented.  Each  processor  is  running  the  same  identical 
NAX  algorithm.  They  only  differ  in  the  portion  of  the  population  being  evaluated.  The  population  is  treated  as 
collection  of  chunks  where  each  processor  evaluates  its  own  assigned  chunks  sharing  the  fitness  of  these 
individuals  with  the  rest  of  the  processors.  This  approach  minimizes  the  communication  cost 


Parallelization  may  not  help  much  if  we  need  to  send  large  rule  sets  across  the  commu¬ 
nication  network.  For  such  reasons,  GBML  techniques  work  very  well  on  moderate 
complexity  problems  (Bernado  et  al.  2001;  Bacardit  and  Butz  2006).  However,  they  need 
to  be  modified  to  deal  with  complex  and  large  data  set,  and  also  avoid  the  boundaries 
imposed  by  the  issues  mentioned  above. 


4.3  NAX:  Incremental  rule  learning  for  very  large  data  sets 

An  incremental  rule  learning  approach  may  alleviate  memory  footprint  requirements  by 
evolving  only  one  rule  at  a  time,  hence,  reducing  the  memory  requirements.  However,  one 
rule  by  itself  cannot  solve  complex  problems.  For  such  a  reason,  each  evolved  rule  is  added 
to  the  final  rule  set,  and  the  covered  examples  are  removed  from  the  current  training  set. 
The  process  is  repeated  until  no  instances  are  left  in  the  training  set.  This  approach  already 
introduced  by  Cordon  et  al.  (2001)  and  later  also  used  by  Bacardit  and  Krasnogor  (2006) 
allows  maintaining  relatively  small  memory  footprints,  making  feasible  processing  large 
data  sets — as  the  prostate  tissue  classification  data  set.  However,  an  incremental  approach 
to  the  construction  of  the  rule  set  requires  paying  special  attention  to  the  way  rules  are 
evolved.  For  each  run  of  the  genetic  algorithm  used  to  evolve  a  rule,  we  would  like  to 
obtain  a  maximally  general  and  maximally  accurate  rule,  that  is,  a  rule  that  covers  the 
maximum  number  of  example  without  making  mistakes  (Wilson  1995). 
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1.  void  evaluate_population  (  Population  *  pp,  InstanceSet  is,  int  iDim,  int  iRows  ) 

2.  { 

3.  int  i; 

4. 

5.  /*  Compute  the  fragments  of  this  processor  */ 

6.  int  iFrag  =  pp->iLen/FCS_processes ; 

7.  int  ilnit  =  FCS_process_id*iFrag; 

8.  int  iLast  =  (FCS_process_id+l==FCS_processes) ? 

9.  pp->iLen: 

10.  (FCS_process_id+l)*iFrag; 

11.  int  iCnt  =  0; 

12.  int  j ,k,l; 

13. 

14.  /*  Create  the  bucket  for  the  broadcast  */ 

15.  float  f aFit [2*iFrag] ; 

16.  float  f aTmp [2*iFrag]  ; 

17. 

18.  /*  Evaluate  the  given  chunk  assigned  to  the  processor  */ 

19.  for  (  i=ilnit , iCnt=0  ;  i<iLast  ;  i++,iCnt++  )  { 

20.  match_rule_set (pp->prs [i] , is , iDim, iRows  ); 

21 .  compute_raw_accuracy_f itness_rule_set (pp->prs [i] ) ; 

22.  faFit[iCnt]  =  pp->prs  [i] ->f Fitness ; 

23.  > 

24. 

25.  /*  Broadcast  each  of  the  chunks  */ 

26.  for  (  i=0  ;  i<FCS_processes  ;  i++  )  { 

27 .  MPI.Bcast ( ( i==FCS_process_id) ?f aFit : f aTmp , iCnt , MPI.FLOAT , i , MPI_C0MM_W0RLD) ; 

28.  if  (  i ! =FCS_process_id  ) 

29.  for  (  1=0, j=i*iFrag,  k=(i+l) *iFrag  ;  j<k  ;  j++,l++  ) 

30.  pp->prs  [j] ->fFitness  =  faTmp[l]; 

31.  > 

32.  } 


Fig.  7  This  figure  presents  an  implementation  of  the  proposed  parallel  evaluation  scheme  using  C  and  MPI. 
The  piece  of  code  presented  below  is  the  only  one  modified  to  provide  such  parallelization  capabilities. 
Each  processor  computes  which  individuals  are  assigned  to  it  (lines  6-10),  then  it  computes  the  fitness  (lines 
10-23),  and  then  it  just  broadcast  the  computed  fitness  (lines  26-31) 


Llora  et  al.  (2007)  have  shown  that  evolving  such  rules  is  possible.  In  order  to  promote 
maximally  general  and  maximally  accurate  rules  a  la  XCS  (Wilson  1995),  we  compute  the 
accuracy  (a)  and  the  error  (s)  of  a  rule  (Llora  et  al.  2005).  The  accuracy  is  the  proportion 
of  overall  examples  correctly  classified,  and  the  error  is  the  proportion  of  incorrect  clas¬ 
sifications  issued.  For  simplicity  reasons,  we  use  the  proportion  of  correctly  issues 
classifications  instead,  simplifying  the  final  fitness  calculation.  Let  nt+  be  the  number  of 
positive  examples  correctly  classified,  nt_  the  number  of  negative  examples  correctly 
classified,  nm  the  number  of  times  a  rule  has  been  matched,  and  nt  the  number  of  examples 
available.  Using  these  values,  the  accuracy  and  error  of  a  rule  r  can  be  computed  as: 


a(r)=n,+  (r)+Mr) 


e(r) 


nt+(r) 

nm(r ) 


(3) 

(4) 


Once  the  accuracy  and  error  of  a  rule  are  known,  the  fitness  can  be  computed  as 
follows. 
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fir)  =  «(r)  •  e(rY  (5) 

where  y  is  the  error  penalization  coefficient.  The  above  fitness  measure  favors  rules  with  a 
good  classification  accuracy  and  a  low  error,  or  maximally  general  and  maximally  accurate 
rules.  By  increasing  y,  we  can  bias  the  search  towards  correct  rules.  This  is  an  important 
element  because  assembling  a  rule  set  based  on  accurate  rules  guarantees  the  overall 
performance  of  the  assembled  rule  set.  In  our  experiments,  we  have  set  y  to  18  to  strongly 
bias  the  search  toward  maximally  general  and  maximally  accurate  rules. 

NAX  ’s  efficient  implementation  of  the  evolutionary  process  is  based  on  the  techniques 
described  using  hardware  acceleration — Sect.  4.1.1 — and  coarse-grain  parallelism — 
Sect.  4.1.2.  The  genetic  algorithm  used  was  a  modified  version  of  the  simple  genetic 
algorithm  (Goldberg  1989)  using  tournament  selection  (s  =  4),  one  point  crossover,  and 
mutation  based  on  generating  new  random  boundary  elements. 


5  Experiments 

This  section  present  the  results  achieved  using  NAX.  To  allow  the  reader  to  compare  with 
other  techniques,  we  compare  the  results  obtained  using  NAX  on  small  data  sets  provided  by 
the  UCI  repository  (Merz  and  Murphy  1998)  to  other  well-known  supervised  learning 
algorithms.  Finally,  we  present  the  first  results  on  the  prostate  tissue  prediction  obtained 
using  NAX.  Results  focus  on  the  viability  of  the  NAX  approach. 


5.1  Some  UCI  repository  data  sets 

The  UCI  repository  (Merz  and  Murphy  1998)  provides  several  data  sets  for  different 
machine  learning  problems.  These  data  sets  have  been  widely  used  to  test  traditional 
machine  learning  and  GBML  techniques.  Table  1  list  the  data  sets  used.  Due  to  the  nature 
of  the  prostate  tissue  type  classification,  we  only  chose  data  sets  with  numeric  attributes. 
Three  of  these  data  sets  are  of  relevant  interest:  (1)  son,  by  far  the  one  with  larger 
dimensionality,  (2)  gls,  the  one  with  large  number  of  classes,  (3)  tao,  proposed  by  Llora 
and  Garrell  (2001),  having  complex  and  non-linear  boundaries. 


Table  1  Summary  of  the  data  sets  used  in  the  experiments 


ID 

Data  set 

Size 

Missing 

values(%) 

Numeric 

attributes 

Nominal 

attributes 

Classes 

bre 

Wisconsin  Breast  Cancer 

699 

0.3 

9 

- 

2 

bpa 

Bupa  Liver  Disorders 

345 

0.0 

6 

- 

2 

gls 

Glass 

214 

0.0 

9 

- 

6 

h  —  s 

Heart  Stats-Log 

270 

0.0 

13 

- 

2 

ion 

Ionosphere 

351 

0.0 

34 

- 

2 

irs 

Iris 

150 

0.0 

4 

- 

3 

son 

Sonar 

208 

0.0 

60 

- 

2 

tao 

Tao 

1888 

0.0 

2 

- 

2 

win 

Wine 

178 

0.0 

13 

- 

3 
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Table  2  Experimental  results:  percentage  of  correct  classifications  and  standard  deviation  from  stratified 
ten-fold  cross-validation  runs 

ID 

0-R 

C4.5 

NAX 

bre 

65.52  ±  1.16 

95.42  ±  1.69 

96.43  ±  1.72 

bpa 

57.97  ±  1.23 

65.70  ±  3.84 

64.07  ±  8.36 

gls 

35.51  ±  4.49 

65.89  ±  10.47 

68.02  ±  8.69 

h  —  s 

55.55  ±  0.00 

76.30  ±  5.85 

75.56  ±  9.39 

ion 

64.10  ±  1.19 

89.74  ±  5.23 

89.19  ±  5.27 

irs 

33.33  ±  0.00 

95.33  ±  3.26 

94.67  ±  4.98 

son 

53.37  ±  3.78 

71.15  ±  8.54 

73.62  ±  9.72 

tao 

49.79  ±  0.17 

95.07  ±2.11 

97.41  ±  0.92 

win 

39.89  ±  3.22 

93.82  ±  2.85 

94.34  ±  6.09 

Paired  f-test  comparisons  showed  no  statistically  significant  differences  between  C4.5  and  NAX  results 
0-R  result  are  just  provided  as  guiding  base  line 


We  could  have  chosen  complex  algorithms  as  baselines  for  NAX  .  However,  we  would 
not  be  able  to  use  them  to  repeat  the  experimentation  on  the  prostate  tissue  classification 
domain.  The  algorithms  used  in  the  comparison  presented  in  Table  2  were  0-R  (Holte 
1993)  (a  simple  base  line  based  on  majority  class  classification)  and  C4.5  (Quinlan  1993). 
Results  show  percentage  of  correct  classifications  and  standard  deviation  from  stratified 
ten-fold  cross-validation  runs.  Paired  t-test  comparisons  showed  no  statistically  significant 
differences  between  the  pruned  tree  produced  by  C4.5  and  NAX  results.  This  experiments 
also  helped  validate  the  distributed  implementation  proposed  by  NAX.  Further  results  on 
empirical  comparisons  can  be  found  elsewhere  (Bernado  et  al.  2001;  Bacardit  and  Butz 
2006). 


5.2  Prostate  tissue  classification 

With  the  previous  results  at  hand,  we  ran  NAX  against  the  prostate  tissue  classification  data 
set.  The  original  data  set  is  defined  by  93  attributes.  In  this  paper,  however,  we  used  the 
reduced  version  of  this  data  set  proposed  by  (Fernandez  et  al.  2005)  which  contains  20 
selected  attributes  out  of  the  93  available.  The  dataset  is  form  by  171,314  records.  Our  goal 
was  to  explore  how  well  NAX  could  generalize  over  unseen  tissue — this  is  the  first  step  to  be 
able  to  address  the  cancer  prediction  problem.  The  other  reason  that  motivated  such 
experimentation  was  to  achieve  similar  accuracy  results  as  the  ones  published  earlier  by 
Fernandez  et  al.  (2005)  using  a  modified  Bayes  technique.  If  NAX  could  perform  at  the 
same  level,  we  will  also  obtain  a  set  of  rules  of  interest  to  the  spectroscopist.  The  inter¬ 
pretation  of  the  rules  will  provide  insight  on  how  to  interpret  the  models  provided  by 
NAX  — which  could  not  be  done  with  the  models  early  used  by  Fernandez  et  al.  (2005). 

We  conducted  stratified  10-fold  cross-validation  experiments  to  measure  the  general¬ 
ization  capabilities  of  NAX  for  this  problem.  Since  the  problem  was  rather  small — larger 
data  set  are  being  prepared  to  be  run  at  the  supercomputing  facilities  provided  by  the 
National  Center  for  Supercomputing  Applications — we  run  the  ten-fold  cross-validation 
runs  in  a  3GHz  dual  core  Pentium  D  computer  with  4  GB  of  RAM.  NAX  took  advantage  of 
the  hardware  support  to  speedup  the  matching  process  and  uses  two  MPI  processes  to 
parallelize — as  introduced  in  Fig.  6 — the  evaluation  of  the  overall  population.  Each  fold 
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took  about  one  hour  to  complete,  with  the  entire  classification  lasting  less  than  half  a  day. 
We  conducted  a  simple  test  of  adding  a  second  computer  with  an  identical  configuration. 
The  overall  time  for  cross-validation  was  reduced  to  half.  Rough  estimates — which  will 
better  measured  when  larger  experiments  are  conducted  on  NCSA  super  computers — show 
that  the  sequential  portion  is  around  1:1000  for  this  small  data  set.  Numbers  get  better  as 
data  set  increases,  which  demonstrates  that  we  will  be  able  to  process  very  large  data  sets 
and  efficiently  exploit  larger  numbers  of  processors. 

We  proposed  another  measure  of  effectiveness,  namely  how  many  records  can  be 
processed  per  second.  Using  a  single  processor  with  the  hardware  acceleration  mechanisms 
built  into  NAX,  and  the  evolved  rule  set  formed  by  1,028  rules,  the  average  throughput  was 
around  60,000  records  per  second.  For  the  prostate  tissue  classification,  it  took  less  than 
three  seconds  to  classify  the  entire  data  set.  Once  the  rule  set  is  learnt,  the  classification 
problem  falls  again  into  the  category  of  embarrassingly  parallel  problems  (Grama  et  al. 
2003).  Since  no  communication  is  needed,  the  speedup  grows  linearly  with  the  number  of 
processors  added — with  the  proper  rule  set  replication  and  data  set  chunking.  Thus,  with 
the  dual  core  box  used  we  where  able  to  just  double  the  throughput  (120,000  records  per 
second)  by  chunking  the  data  set  and  use  both  processors. 

The  previous  results  show  the  benefits  of  hardware  acceleration  and  parallelization,  but 
NAX  was  also  able  to  achieve  very  competitive  classification  accuracy  in  generalization, 
correctly  classifying  97.09  ±  0.09  of  the  records  (pixels)  during  the  stratified  ten-fold 
cross-validation.  Figure  8  presents  the  regenerated  prostate  tissue  classification  image 
presented  in  Fig.  2  using  a  rule  set  assembled  by  NAX.  Figure  8a  presents  the  incorrectly 
classified  pixels.  Most  of  the  mistakes  by  the  rule  set  involve  similar  tissues  with  few 
training  records  available.  This  trend  was  also  shown  elsewhere  (Fernandez  et  al.  2005). 
C4.5  does  not  provide  any  statistically  significant  improvement  (only  a  marginal,  not 
statistically  significant,  0.7%)  and  provided  large  decision  trees  with  more  than  5,000 
leaves — not  to  mention  the  lack  of  scalability  when  compared  to  NAX. 

The  rule  set  assembled  by  NAX  represents  an  incremental  assembling  of  maximally 
general  and  maximally  accurate  rules.  Thus,  we  can  compute  how  the  accuracy  of  such 
ensemble  improves  as  new  rules  are  added.  Figure  9  presents  the  overall  accuracy  as  rules 
are  added.  It  shows  an  interesting  behavior  for  classifying  prostate  tissue.  Using  only  20 
rules  out  of  the  1,028  evolved  ones,  the  overall  accuracy  is  90%,  the  incorrectly  classified 
1.3%  pixels,  and  8.7%  were  left  unclassified.  After  inspecting  the  misclassified  pixels  most 
of  them  belongs  to  borders  between  tissues  and  mislabeling  arises  from  the  image  dis¬ 
cretization — one  pixel  containing  different  tissue  types.  Table  3  presents  the  initial  four 
rules  that  covering  80%  of  the  instances  belonging  to  the  two  larger  tissue  types — 
epithelium  and  fibrous  stroma.  Such  results  are  relevant,  not  only  for  their  accuracy,  but 
also  because  of  the  insight  they  provide  to  the  spectroscopist  about  the  problem  structure. 


6  Conclusions  and  further  work 

This  paper  has  presented  the  initial  results  achieved  in  predicting  prostate  tissue  type  using 
GBML  techniques.  Being  able  to  classify  unseen  tissue  quickly,  reliably,  and  accurately,  is 
the  first  step  towards  the  creation  of  CADx  systems  that  may  assist  a  pathologist  diag¬ 
nosing  prostate  cancer.  We  have  proposed  two  main  efficiency  enhancement  techniques  for 
GBML — exploiting  hardware  parallelization  via  vector  instructions  and  coarse-grain  par¬ 
allelism  via  the  usage  of  MPI  libraries — which  allowed  us  to  approach  very  large  data  sets. 
These  techniques,  together  with  an  incremental  genetics-based  rule  learning  approach  to 
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Fig.  8  The  figures  presented 
above  show  the  regenerated 
prostate  tissue  classification 
image  presented  in  Fig.  2.  (a) 
presents  the  correctly  classified 
pixels,  (b)  presents  the 
incorrectly  classified  pixels 
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assemble  rule  sets  formed  by  maximally  general  and  maximally  accurate  rules,  have  led  to 
the  creation  of  NAX,  a  system  specialized  on  dealing  with  large  data  sets. 

Results  have  shown  accurate  classification  models  for  prostate  tissue  along  with  good 
scalability  of  the  NAX  implementation.  Results  also  reveal  peculiarities  of  the  underlying 
problem  structure.  With  very  few  rules — 20 — we  were  able  to  correctly  classify  up  to  90% 
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Fig.  9  The  rule  set  as  a  decision 
list.  The  figure  presents  the 
classification  accuracy  as  we 
keep  adding  rules  to  the  decision 
list.  The  first  20  initial  rules  are 
able  to  cover  91%  of  the  records 
with  a  classification  accuracy  of 
98.5-90%  overall  accuracy 
presented  in  the  figure 


Number  of  rules  used 


Table  3  First  top  four  maximally  general  and  maximally  accurate  rules  that  compose  the  final  rule  set.  The 
rule  set  is  treated  as  a  decision  list,  thus  we  can  easily  incrementally  evaluate  the  value  of  the  initial  four 
ones 


Rule 

Rule  condition 

Tissue  type 

Accumulated 
accuracy  (%) 

Covered 
records  (%) 

1. 

0.10  <  ax  <  0.25  A  0.00  <a4<  0.04  A 

1.07  <  fl8  <  2.01  A  -0.07  <  a16  <  0.16  A 

0.25  <  axl  <  2.86  A  0.11  <  al8  <  0.21 

— >  Fibrous  stroma 

41.32 

41.96 

2. 

0.03  <  ax  <  0.11  A  0.05  <  a7  <  0.20  A 

1231.88  <al2<  1247.90  A  1.98  <  al7  <  3.83  A 

0.13  <  ais  <  0.20 

— ►  Epithelium 

68.53 

69.61 

3. 

0.07  <  a0<  0.16  A  0.14  <  ax<  0.41  A 

0.71  <  a10<  1.13  A  1527.54  <  a15  <  1533.80  A 
0.65  <  aX9  <  1.50 

-*■  Fibrous  stroma 

71.59 

72.75 

4. 

0.05  <  a2  <  0.09A  0.76  <  a4  <  1.29A 

1.80  <  a6<  2.08A  0.17  <  a7  <  0.24A 

0.26  <  al6<  0.53A  2.79  <  axl  <  7.01A 

0.21  <  aX8  <  0.32 

->  Epithelium 

80.78 

82.08 

of  the  tissue.  Our  current  work  is  focused  on  analyzing  the  larger  data  sets  containing  all 
the  available  features  and  different  tissue  sources  to  test  the  parallelization  scalability  of 
NAX  on  NCSA  supercomputers.  Once  accomplished,  the  procedure  will  provide  confidence 
in  creating  a  CADx  system  to  generate  a  diagnosis  based  on  the  evolved  models. 

Acknowledgments  We  would  like  to  thank  David  E.  Goldberg  for  his  continual  support  and  encour¬ 
agement,  allowing  us  to  have  access  to  the  IlliGAL  resources.  Thanks  also  to  Kumara  Sastry  for  hallway 
discussions  and  to  the  Automated  Learning  Group  and  the  Data-Intensive  Technologies  and  Applications  at 
the  National  Center  for  Supercomputing  Applications  for  hosting  this  joint  collaboration.This  work  was 
sponsored  by  the  Air  Force  Office  of  Scientific  Research,  Air  Force  Materiel  Command,  USAF,  under  grant 


Springer 


Histopathology  using  genetics-based  machine  learning 


FA9550-06- 1-0370,  the  National  Science  Foundation  under  grant  IIS-02-09199,  and  the  National  Institute  of 
Health.  The  US  Government  is  authorized  to  reproduce  and  distribute  reprints  for  Government  purposes 
notwithstanding  any  copyright  notation  thereon.The  views  and  conclusions  contained  herein  are  those  of  the 
authors  and  should  not  be  interpreted  as  necessarily  representing  the  official  policies  or  endorsements,  either 
expressed  or  implied,  of  the  Air  Force  Office  of  Scientific  Research,  the  National  Science  Foundation,  or  the 
US  Government.Rohit  Bhargava  would  like  to  acknowledge  collaborators  over  the  years,  especially  Dr. 
Stephen  M.  Hewitt  and  Dr.  Ira  W.  Levin  of  the  National  Institutes  of  Health,  for  numerous  useful  dis¬ 
cussions  and  guidance.  Funding  for  this  work  was  provided  in  part  by  University  of  Illinois  Research  Board 
and  by  the  Department  of  Defense  Prostate  Cancer  Research  Program.  This  work  was  also  funded  in  part  by 
the  National  Center  for  Supercomputing  Applications  and  the  University  of  Illinois,  under  the  auspices  of 
the  NCSA/UIUC  faculty  fellows  program. 


References 

Amdahl  G  (1967)  Validity  of  the  single  processor  approach  to  achieving  large-scale  computing  capabilities. 
In  Proceedings  of  the  American  federation  of  information  processing  societies  conference  (AFIPS). 
30:483-485  AFIPS 

Bacardit  J,  Butz  M  (2006)  Advances  at  the  frontier  of  Learning  Classifier  Systems.  Chapter  data  mining  in 
Learning  Classifier  Systems:  Comparing  XCS  with  GAssist,  vol  I.  Springer 
Bacardit  J,  Krasnogor  N  (2006)  Biohel:  Bioinformatics-oriented  hierarchical  evolutionary  learning 
(Nottingham  ePrints).  University  of  Nottingham 
Barry  A,  Drugowitsch  J  (1997)  LCSWeb:  the  LCS  wiki,  http://www.lcsweb.cs.bath.ac.uk/ 

Bernado  E,  Llora  X,  Garrell  J  (2001)  Advances  in  Learning  Classifier  Systems:  4th  international  workshop 
(IWLCS  2001).  Chapter  XCS  and  GALE:  a  comparative  study  of  two  Learning  Classifier  Systems  with 
six  other  learning  algorithms  on  classification  tasks.  Springer  Berlin,  Heidelberg,  pp  115-132 
Bhargava  R,  Fernandez  D,  Hewitt  S,  Levin  I  (2006)  High  throughput  assessment  of  cells  and  tissues: 
Bayesian  classification  of  spectral  metrics  from  infrared  vibrational  spectroscopic  imaging  data. 
Biochemica  et  Biophisica  Acta  1758(7): 830-845 

Cantu-Paz  E  (2000)  Efficient  and  accurate  parallel  genetic  algorithms.  Kluwer  Academic  Publishers 
Cordon  O,  Herrera  F,  Hoffmann  F,  Magdalena  L  (2001)  Genetic  fuzzy  systems.  Evolutionary  tuning  and 
learning  of  fuzzy  knowledge  bases.  World  Scientific 

Fernandez  D,  Bhargava  R,  Hewitt  S,  Levin  I  (2005)  Infrared  spectroscopic  imaging  for  histopathologic 
recognition.  Nat  Biotechnol  23(4):469-474 

Flockhart  I  (1995)  GA-MINER:  parallel  data  mining  with  hierarchical  genetic  algorithms  (final  report). 
(Technical  Report  Technical  Report  EPCCAIKMS-GA-MINER-REPORT  1.0).  University  of 
Edinburgh 

Gabriel  E,  Fagg  G,  Bosilca  G,  Angskun  T,  Dongarra  J,  Squyres  J,  Sahay  V,  Kambadur  P,  Barrett  B, 
Lumsdaine  A,  Castain  R,  Daniel  D,  Graham  R,  Woodall  T  (2004)  Open  MPI:  goals,  concept,  and  design 
of  a  next  generation  MPI  implementation.  In  Proceedings  of  the  11th  European  PVMMPI  Users’  group 
meeting  Springer 

Goldberg  D  (1989)  Genetic  algorithms  in  search,  optimization,  and  machine  learning.  Addison- Wesley 
Professional 

Goldberg  D  (2002)  The  design  of  innovation:  lessons  from  and  for  competent  genetic  algorithms.  Springer 
Grama  A,  Gupta  A,  Karypis  G,  Kumar  V  (2003)  Introduction  to  parallel  computing.  Addison-Wesley 
Holte  R  (1993)  Very  simple  classification  rules  perform  well  on  most  commonly  used  datasets.  Mach  Learn 
11:63-91 

Lattouf  J-B,  Saad  F  (2002)  Gleason  score  on  biopsy:  is  it  reliable  for  predcting  the  final  grade  on  pathology? 
BJU  Int  90:694-699 

Levin  I,  Bhargava  R  (2005)  Fourier  transform  infrared  vibrational  spectroscopic  imaging:  integrating 
microscopy  and  molecular  recognition.  Annu  Rev  Phys  Chem  56:  429-474 
Llora  X  (2002)  Genetics-based  machine  learning  using  fine-grained  parallelism  for  data  mining.  Doctoral 
dissertation,  Enginyeria  i  Arquitectura  La  Salle.  Ramon  Llull  University,  Barcelona,  Catalonia,  Euro¬ 
pean  Union 

Llora  X  (2006)  Learning  Classifier  Systems  and  other  genetics-based  machine  learning  Blog. 
http://www-illigal.ge.uiuc.edulcs-n-gbml/ 

Llora  X,  Garrell  J  (2001)  Knowledge-independent  data  mining  with  fine-grained  parallel  evolutionary 
algorithms.  In  Proceedings  of  the  genetic  and  evolutionary  computation  conference  (GECCO’2001). 
Morgan  Kaufmann  Publishers,  pp  461-468 


Springer 


X.  Llora  et  al. 


Llora  X,  Goldberg  D  (2003)  Bounding  the  effect  of  noise  in  multiobjective  Learning  Classifier  Systems. 
Evol  Comput  J  ll(3):279-298 

Llora  X,  Sastry  K  (2006)  Fast  rule  matching  for  Learning  Classifier  Systems  via  vector  instructions.  In 
Proceedings  of  the  2006  genetic  and  evolutionary  computation  conference.  ACM  Press,  pp  1513-1520 
Llora  X,  Sastry  K,  Goldberg  D  (2005)  The  compact  classifier  system:  motivation,  analysis  and  first  results. 
In  Proceedings  of  the  congress  on  evolutionary  computation,  vol  1.  IEEE  press,  (Also  as  IlliGAL  TR  No 
2005019,  pp  596-603) 

Llora  X,  Sastry  K,  Goldberg  D,  de  la  Ossa  L  (2007)  The  /-ary  extended  compact  classifier  system:  linkage 
learning  in  Pittsburgh  LCS.  In  Advances  at  the  frontier  of  Learning  Classifier  Systems,  vol  II.  IlliGAL 
report  no  2006015.  Springer,  pp  (in  preparation) 

Merz  CJ,  Murphy  PM  (1998)  UCI  repository  for  machine  learning  data-bases,  http://www.ics.uci. 

edu/  ~  mlearn/MLRepository.html 
Mitchell  T  (1997)  Machine  learning.  McGraw  Hill 

Orriols-Puig  A,  Bernado-Mansilla  E  (2006)  A  further  look  at  UCS  classifier  system.  In  Proceedings  of  the 
8th  annual  conference  on  genetic  and  evolutionary  computation  workshop  program.  ACM  Press 
Quinlan  JR  (1993)  C4.5:  Programs  for  machine  learning.  Morgan  Kaufmann 
Stone  C,  Bull  L  (2003)  For  real!  XCS  with  continuous-valued  inputs.  Evol  Comput  J  ll(3):279-298 
Wilson  S  (1995)  Classifier  fitness  based  on  accuracy.  Evol  Comput  3(2):  149-175 
Wilson  S  (2000a)  Get  real!  XCS  with  continuous-valued  inputs.  Lect  Notes  Comput  Sci  1813:209-219 
Wilson  S  (2000b)  Mining  oblique  data  with  xcs.  In  Revised  papers  of  the  3th  international  workshop  on 
Learning  Classifier  Systems  (IWLCS  2000).  Springer,  pp  158-176 


4)  Springer 


Towards  Better  than  Human  Capability  in  Diagnosing 
Prostate  Cancer  Using  Infrared  Spectroscopic  Imaging 


Xavier  Llora1,  Rohith  Reddy2  3,  Brian  Matesic2,  and  Rohit  Bhargava2  3 

National  Center  for  Super  Computing  Applications  (NCSA) 
department  of  Bioengineering 
3Beckman  Institute  for  Advanced  Science  and  Technology 
University  of  Illinois  at  Urbana-Champaign,  Urbana  IL  61801 
xllora@uiuc.edu,  rkreddy2@uiuc.edu,  matesic2@uiuc.edu,  rxb@uiuc.edu 


ABSTRACT 

Cancer  diagnosis  is  essentially  a  human  task.  Almost  univer¬ 
sally,  the  process  requires  the  extraction  of  tissue  (biopsy) 
and  examination  of  its  microstructure  by  a  human.  To  im¬ 
prove  diagnoses  based  on  limited  and  inconsistent  morpho¬ 
logic  knowledge,  a  new  approach  has  recently  been  proposed 
that  uses  molecular  spectroscopic  imaging  to  utilize  micro¬ 
scopic  chemical  composition  for  diagnoses.  In  contrast  to 
visible  imaging,  the  approach  results  in  very  large  data  sets 
as  each  pixel  contains  the  entire  molecular  vibrational  spec¬ 
troscopy  data  from  all  chemical  species.  Here,  we  propose 
data  handling  and  analysis  strategies  to  allow  computer- 
based  diagnosis  of  human  prostate  cancer  by  applying  a 
novel  genetics-based  machine  learning  technique  (NAX).  We 
apply  this  technique  to  demonstrate  both  fast  learning  and 
accurate  classification  that,  additionally,  scales  well  with 
parallelization.  Preliminary  results  demonstrate  that  this 
approach  can  improve  current  clinical  practice  in  diagnos¬ 
ing  prostate  cancer. 
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1.  INTRODUCTION 

Pathologist  opinion  of  structures  in  stained  tissue  is  the 
definitive  diagnosis  for  almost  all  cancers  and  provides  criti¬ 
cal  input  for  therapy.  In  particular,  prostate  cancer  accounts 
for  one-third  of  noncutaneous  cancers  diagnosed  in  US  men, 
and  it  is  a  leading  cause  of  cancer-related  death.  Hence, 
it  is,  appropriately,  the  subject  of  heightened  public  aware¬ 
ness  and  widespread  screening.  If  prostate-specific  antigen 
(PSA)  or  digital  rectal  screens  are  abnormal,  a  biopsy  is 
considered  to  detect  or  rule  out  cancer.  Prostate  tissue  is 
extracted,  or  biopsied,  from  the  patient  and  examined  for 
structural  alterations.  The  diagnosis  procedure  involves  the 
removal  of  cells  or  tissues,  staining  them  with  dyes  to  pro¬ 
vide  visual  contrast  and  examination  under  a  microscope  by 
a  skilled  person  (pathologist). 

The  challenge  in  prostate  cancer  research  and  practice 
is  to  provide  a  novel  Due  to  personnel,  tarining,  natural 
variability  and  biologic  differences,  the  challenge  in  prostate 
cancer  research  and  practice  is  to  provide  accurate,  objec¬ 
tive  and  reproducible  decisions.  Conventional  optical  mi¬ 
croscopy  followed  by  manual  recognition  has  been  demon¬ 
strated  to  be  inadequate  for  this  task.  [l8].  Hence,  we  have 
recently  proposed  developing  a  practical  approach  to  this 
problem  using  chemical,  rather  than  morphologic,  imaging. 

19  .  In  this  approach,  Fourier  transform  infrared  imag¬ 
ing  (FTIR)  is  employed  to  provide  the  entire  vibrational 
spectroscopic  information  from  every  pixel  of  a  sample’s  mi¬ 
croscopy  image.  While  the  first  steps  of  developing  novel 
imaging  and  sampling  technologies  is  now  reliable,  7  the 
computational  challenge  of  providing  robust  classification 
algorithms  that  can  rapidly  provide  decisions  remains.  Due 
to  the  above  advances  in  imaging  and  sampling,  data  from 
thousands  of  patients  is  available  to  train  and  validate  al¬ 
gorithms  for  different  disease  states.  While  the  application 
and  type  of  data  are  unique,  a  further  confounding  factor  re¬ 
quired  efficiently  processing  large  volumes  of  data  generated 
by  FTIR  imaging.  The  classification  problem  can  be  for¬ 
mulated  as  a  supervised  learning  problem  in  which  several 
million  pixels  (hundred  of  gigabytes)  of  accurately  labeled 
data  are  available  for  model  training  and  validation.  The 
volume  of  tissue  and  (future)  need  for  intra-operative  diag¬ 
noses  imply  that  rapid  and  accurate  diagnoses  are  crucial 
to  allow  physicians  to  explore  all  possible  courses  of  action. 
Under  these  conditions,  traditional  supervised  learning  ap- 
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proaches  and  implementations  do  not  scale  to  provide  diag¬ 
noses  in  an  appropriate  time  frame.  Hence,  efficiently  pro¬ 
cessing  and  learning  models  from  gigabytes  of  FITR  imag¬ 
ing  data  requires  a  careful  design  of  the  supervised  learning 
algorithm.  Moreover,  the  biological  nature  of  the  problem 
requires  that  such  models  be  interpretable  to  provide  funda¬ 
mental  new  insight  into  the  disease  process.  Genetics-based 
machine  learning  (GBML)  techniques  take  advantage  of  the 
“ quasi  embarrassing  parallelism”  17]  to  provide  scaleable, 
fast,  accurate,  reliable,  and  interpretable  models.  In  this 
paper  we  present  an  approach  engineered  to  the  desired  so- 
lutiona  and  constraints  of  addressing  this  human  task.  A 
modified  version  of  a  sequential  genetics-based  rule  learner 
that  exploits  massive  parallelisms  via  the  message  passing 
interface  (MPI)  and  efficient  rule-matching  using  hardware- 
oriented  operations  is  developed.  We  named  this  system  NAX 
[24],  and  we  have  shown  that  its  performance  is  compara¬ 
ble  to  traditional  and  genetics-based  machine  learning  tech¬ 
niques  on  an  array  of  publicly  available  data  sets.  We  now 
show  thatNAX — taking  advantage  of  both  hardware  and  soft¬ 
ware  parallelism — is  able  to  provide  prostate  cancer  diag¬ 
noses  that  are  human-competitive.  In  this  paper,  we  present 
preliminary  results  supporting  this  outcome. 

The  paper  is  structured  as  follows.  Section  [2]  provides 
an  overview  of  our  approach  towards  computer-aided  diag¬ 
noses  for  prostate  cancer.  Procedure  and  form  of  the  data 
are  summarized  in  section  [3]  NAX  is  introduced  in  section 
[4]  where  we  describe  the  basic  components  and  design  deci¬ 
sions  in  this  approach.  In  section  [5]  we  present  preliminary 
results  indicating  that  the  approach  presented  in  this  paper 
is  human-competitive.  Finally,  section  [6]  summarizes  some 
conclusions  and  further  research. 

2.  PROBLEM  DESCRIPTION 

Prostate  cancer  is  the  most  common  non-skin  malignancy 
in  the  western  world.  The  American  Cancer  Society 
estimated  234,460  new  cases  of  prostate  cancer  in  2006 
[31] .  Recognizing  the  public  health  implications  of  this 
disease,  men  are  actively  screened  through  digital  rectal 
examinations  and/or  serum  prostate  specific  antigen  (PSA) 
level  testing.  If  these  screening  tests  are  suspicious,  prostate 
tissue  is  extracted,  or  biopsied,  from  the  patient  and  exam¬ 
ined  for  structural  alterations.  Due  to  imperfect  screening 
technologies  and  repeated  examinations,  it  is  estimated  that 
more  than  1  million  people  undergo  biopsies  in  the  US  alone. 

2.1  Prostate  Cancer  Diagnosis 

The  removal  of  a  small  section  of  prostate  is  most  of¬ 
ten  accomplished  by  core  biopsy.  A  needle  is  inserted  into 
the  tissue  and  several  (6-23)  samples  are  obtained  from  dif¬ 
ferent  positions.  Biopsy,  followed  by  manual  examination 
under  a  microscope  is  the  primary  means  to  definitively  di¬ 
agnose  prostate  cancer  as  well  as  most  internal  cancers  in 
the  human  body.  Pathologists  are  trained  to  recognize  pat¬ 
terns  of  disease  in  the  architecture  of  tissue,  local  structural 
morphology  and  alterations  in  cell  size  and  shape.  Specific 
patterns  of  specific  cell  types  distinguish  cancerous  and  non- 
cancerous  tissues.  Hence,  the  primary  task  of  the  patholo¬ 
gist  examining  tissue  for  cancer  is  to  locate  foci  of  the  cell 
of  interest  and  examine  them  for  alterations  indicative  of 
disease. 

The  specific  cells  in  which  cancer  arises  in  the  prostate 


are  epithelial  cells.  While  epithelial-origin  cancers  account 
for  over  85%  of  all  human  cancers,  they  account  for  more 
than  95%  of  prostate  cancers.  In  prostate  tissue,  epithe¬ 
lial  line  secretory  ducts  within  the  structural  cells  (collec¬ 
tively  termed  ‘stroma’)  that  allow  the  tissue  to  maintain  its 
structure  and  function.  Hence,  a  pathologist  will  first  locate 
epithelial  cells  in  a  biopsy  and,  to  examine  for  cancer,  will 
mentally  segment  them  from  stroma. 

Biopsy  samples  are  prepared  in  a  specific  manner  to  aid 
in  recognition  of  cells  and  disease.  The  sample  is  sliced  thin 
(~  5 fim  thickness),  placed  on  a  glass  slide  and  stained  with 
a  dye  to  provide  contrast.  The  most  common  dye  is  a  mix¬ 
ture  of  hematoxylin  and  eosin  (H&E),  which  stains  protein- 
rich  regions  pink  and  nucleic  acid-rich  regions  blue.  Empty 
space,  lipids  and  carbohydrates  are  typically  not  stained  and 
characterized  by  white  color  in  images.  Staining  allows  the 
pathologist  to  identify  cells  based  on  their  nucleus  and  extra- 
nuclear  regions.  Patterns  of  the  same  cell  type  characterize 
structures.  For  example,  epithelial  cells  arranged  in  a  circu¬ 
lar  manner  around  empty  space  are  characteristic  of  a  duct 
and  endothelial  cells  similarly  arranged  are  characteristic  of 
blood  vessels.  The  empty  space  enclosed  within  a  duct  in 
pathology  images  is  termed  a  lumen.  The  distortion  of  the 
circular  pattern  of  epithelial  cells  around  a  lumen  is  charac¬ 
teristic  of  cancer. 

In  low  severity  cancers,  lumens  are  only  slightly  distorted, 
while  higher  grades  of  cancer  display  a  lack  of  lumen  and 
simply  consist  of  masses  of  epithelial  cells  supported  by  little 
stroma.  The  relative  distortion  and  change  in  lumen  shape 
is  organized  into  a  grading  scheme  to  assess  the  severity  of 
the  disease,  Gleason  Scoring  system,  which  is  the  primary 
measure  of  disease  that  defines  diagnosis,  helps  direct  ther¬ 
apy  and  helps  predict  those  at  danger  of  dying  from  the 
disease.  Since  prostate  cancer  is  multi-focal  and  the  disease 
quite  variable,  two  dominant  patterns  of  epithelial  distortion 
are  selected  and  each  is  independently  graded  on  a  scale  of 
1-5.  The  grades  are  then  summed  to  provide  a  Gleason  score 
ranging  from  2  (low  grade  cancer)  to  10  (maximum  danger 
cancer).  This  scale  has  been  widely  used  since  its  creation 
in  the  1960s  and  currently  forms  the  clinical  standard  of 
practice.  Manual  Gleason  scoring,  however,  has  severe  lim¬ 
itations. 

2.2  Limitations  of  Current  Practice 

Widespread  screening  for  prostate  cancer  has  resulted  in 
a  large  workload  of  biopsied  men  |l6],  placing  an  increasing 
demand  on  services.  Operator  fatigue  is  well-documented 
and  guidelines  limit  the  workload  and  rate  of  examination 
of  samples  by  a  single  operator  (examination  speed  and 
throughput).  Importantly,  inter-  and  intra-pathologist  vari¬ 
ation  complicates  decision-making.  The  consistency  in  de¬ 
termining  Gleason  scores  is  rather  poor.  Intra-observer  mea¬ 
surements  show  that  a  pathologist  confirms  their  own  score 
less  than  50%  of  the  time  and  are  ±1  score  no  more  than 
80%  of  cases  [2].  Hence,  the  diagnoses  for  ~  50%  of  cases 
may  change  and  may  be  significantly  altered  for  ~  20%  of 
cases  ultimately  leading  to  changes  in  therapy  for  a  patient 
subset  30  .  The  numbers  are  decidedly  cause  for  concern. 
For  example,  a  recent  study  including  15  pathologists  and 
537  prostate  cancer  patients,  70.8%  of  Gleason  scores  were 
shown  to  be  inaccurate  when  compared  with  the  patient’s 
final  outcome  [13] .  Second  opinions  [29  improve  assessment 
and  are  cost-effective  [10  ,  not  to  mention  their  utility  in  mit- 
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igating  the  effects  of  healthcare  costs,  lost  wages,  morbidity, 
or  potential  litigation,  tn  summary,  the  manual  recognition 
of  spatial  patterns  leaves  much  to  be  desired  from  a  process 
perspective  and  has  far-reaching  social  effects  from  a  public 
health  perspective. 

For  the  reasons  underlined  above,  there  is  an  urgent  need 
for  high-throughput,  automated  and  objective  pathology 
tools.  We  believe  that  this  need  is  best  met  by  employing 
the  power  of  computer  algorithms  and  advanced  processing 
to  address  prostate  cancer  diagnosis  and  grading. 

The  information  content  of  conventionally  stained  images 
is  limited,  inherently  non-specific  and  varies  greatly  within 
patient  populations  and  processing  conditions.  Hence,  the 
information  derived  from  visible  microscopy  images  is  fun¬ 
damentally  limited  and  automated  methods  of  analyzing 
stained  images  have  failed  to  provide  a  sufficiently  robust  al¬ 
gorithm  to  diagnose  disease.  An  alternative  to  morphology- 
based  microscopy  are  molecular  microscopy  techniques  to 
probe  disease.  Molecular  technologies  for  disease  diagnosis 
are  an  exciting  venue  for  investigations  as  they  promise  bet¬ 
ter  diagnostic  capabilities  through  objective  means  and  a 
multitude  of  chemicals  to  provide  insight  into  the  changes 
indicative  of  the  disease  process.  tn  particular,  spec¬ 
troscopy  tools  allow  for  the  measurement  of  many  molecular 
species  simultaneously.  Spectroscopic  techniques  in  imaging 
form,  notably  using  optics,  further  enable  the  analysis  to 
be  conducted  without  perturbing  the  tissue  1 1] .  In  this 
manuscript,  we  present  the  analysis  of  prostate  tissue  with 
one  such  technique,  Fourier  transform  infrared  (FTIR)  spec¬ 
troscopic  imaging. 

2.3  Molecular  Imaging 

Infrared  spectroscopy  is  a  classical  technique  for  measur¬ 
ing  the  chemical  composition  of  specimens.  At  specific  fre¬ 
quencies,  the  vibrational  modes  of  molecules  are  resonant 
with  the  frequency  of  infrared  light.  By  monitoring  all  fre¬ 
quencies  in  the  region,  a  pattern  of  absorption  can  be  cre¬ 
ated.  This  pattern,  or  spectrum,  is  characteristic  of  the 
chemical  composition  and  is  hypothesized  to  contain  infor¬ 
mation  that  will  help  determine  the  cell  type  and  disease 
state  of  the  tissue.  Recently,  FTIR  spectroscopy  has  been 
developed  in  an  imaging  sense.  Hence,  The  data  are  similar 
to  optical  microscopy.  The  first  difference  is  that  no  external 
dyes  are  needed  and  the  contrast  in  images  can  be  directly 
obtained  from  the  chemical  composition  of  the  tissue.  The 
second  is  that  each  pixel  in  the  visible  image  contains  RGB 
values  but  in  IR  imaging  contains  several  thousand  values 
across  a  bandwidth  (2000  —  14000nra)  that  is  ~  40  times 
larger  than  the  visible  spectrum  (400  —  700nm)  [7  . 

3.  DATA  AND  METHODOLOGY 
3.1  Experimental  Details 

Prostate  tissues  were  obtained  from  Cooperative  Hu¬ 
man  Tissue  Network  for  the  tissue  array  research  program 
(TARP)  laboratory.  Using  these  tissues,  tissue  microarrays 
were  prepared  using  a  Beecher  automated  tissue  arrayer  con¬ 
taining  a  video  overlap  system  and  0.6mm  needles.  Appro¬ 
priate  institutional  review  board  and  National  Institutes  of 
Health  (USA)  guidelines  for  the  protection  of  human  sub¬ 
jects  were  followed.  5 fim  sections  of  tissue  were  floated  on  an 
infrared  transmissive  optical  window  for  FTIR  spectroscopic 
imaging.  Another  5 fim  section  obtained  from  the  same  point 
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Figure  1:  Conventional  Staining  and  Automated 
Recognition  by  Chemical  Imaging.  (A)  Typical 
H&E  stained  sample,  in  which  structures  are  de¬ 
duced  from  experience  by  a  human.  Highlights  of 
specific  regions  in  the  manner  of  H&E  is  possible 
using  FTIR  imaging  without  stains.  (B)  Absorp¬ 
tion  at  1080  cm-1  commonly  attributed  to  nucleic 
acids  and  (C)  to  proteins  of  the  stroma.  The  data 
obtained  is  3  dimensional  (D)  from  which  spectra 
(E)  or  images  at  specific  spectral  features  may  be 
plotted. 


on  the  tissue  specimen  was  observed  using  traditional  mi¬ 
croscopy  for  comparison.  Expert  pathologists  determined 
the  tissue  classification  using  these  microscopy  samples  by 
staining  with  H&E.  Pathologists’  classification  were  used 
as  the  ‘gold  standard’  for  comparison  with  the  results  from 
the  methods  mentioned  in  this  paper. 

Tissues  were  analyzed  using  a  Michelson  interferometer 
attached  to  a  microscope  (Perkin-Elmer  Spotlight  300)  in 
transmission  mode  at  a  resolution  of  4cm_1  The  sample 
was  then  raster  scanned  to  obtain  images  of  the  entire  spec¬ 
imen.  Typical  specimen  size  is  600 ym  x  600 ym  with  each 
pixel  being  6.25 ym  x  6.25 ym  on  the  sample  plane.  Spectra 
are  composed  of  1,641  sample  points  of  the  spectral  range 
4,000  —  720cm_1.  Data  acquisition  using  these  techniques 
required  40  minutes  per  cylindrical  core  of  the  tissue  mi¬ 
croarray  to  yield  a  root  mean  square  signal  to  noise  ratio  of 
500  :  1.  A  typical  array  was  composed  of  approximately  2.5 
million  pixels  and  required  40  GB  of  storage  space. 

The  data  obtained  from  FTIR  imaging  is  three- 
dimensional.  The  x—  and  y— dimensions  locate  pixels  on 
the  tissue-sample  plane.  The  ^-dimension  values  compose 
the  IR  spectrum  for  the  corresponding  pixel.  The  spectra 
can  be  analyzed  to  determine  what  type  of  tissue  (epithe¬ 
lium,  stroma,  or  muscle)  the  specimen  is  as  well  as  whether 
the  tissue  is  malignant  or  benign.  We  have  developed  this 
technology  to  provide  data  from  tissue  in  minutes  and  em¬ 
ploy  a  high-throughput  sampling  strategy  using  Tissue  Mi¬ 
croarrays  (TMA)  to  obtain  data.[l9|  Samples  from  multiple 
tissues,  from  multiple  patients  and  multiple  clinical  settings 
are  included  in  the  data  set  to  maximize  the  sampling  of 
natural  variability  and  ensure  the  development  of  robust 
analysis  algorithms.  These  high-throughput  imaging  and 
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microarray  technologies  combine  to  provide  very  large  data 
sets — see  Figure [l]  A  typical  single  core  consists  of  300  x  300 
pixels  on  the  x  —  y  plane  with  1641  bands  on  the  z- axis.  A 
tissue  microarray  consists  of  several  hundred  such  cores  and 
analysis  of  such  large  datasets  (typically,  tens  of  GB)  is  com¬ 
putationally  expensive. 

3.2  Data  Format 

Each  pixel’s  ^-dimension  contains  a  spectrum  character¬ 
istic  of  the  chemical  composition  of  that  region  of  the  speci¬ 
men.  Certain  spectral  quantities  provide  measures  of  chem¬ 
istry.  For  example,  the  height  of  each  feature  is  propor¬ 
tional  to  its  abundance,  the  peak  position  is  associated  with 
the  vibrational  identity  and  peak  shape  often  reflects  the 
multitude  of  environments  around  the  molecule.  Therefore, 
differences  in  spectral  characteristics  can  be  used  in  classifi¬ 
cation  and  these  exact  spectral  features  are  termed  ‘metrics’. 
For  example,  the  ratio  of  absorbance  of  the  spectral  peak  at 
1080cm-1  to  the  spectral  peak  at  1545cm-1  is  commonly 
used  to  distinguish  epithelial  from  stromal  cells.  Trained 
spectroscopists  determine  these  metrics  based  upon  exam¬ 
ination  of  spectral  patterns.  Hence,  the  reduction  of  ull 
spectra  to  descriptive  metrics  forms  an  intelligent  dimen¬ 
sionality  reduction  strategy.  Genetic  algorithms  form  de¬ 
cision  rules  based  upon  these  metrics  to  classify  pixels  by 
tissue  type.  Furthermore,  the  transparency  of  the  genetic 
algorithms  allows  the  scientist  to  correlate  specific  rules  to 
biological  features  (tissue  type  and  cancer  classification)  via 
metrics  based  upon  spectral  characteristics. 

4.  APPROACH 

In  this  section  we  review  related  work  on  the  GBML  com¬ 
munity,  highlighting  previous  efforts  to  deal  with  large  data 
sets.  We  also  present  the  motivation  and  techniques  that 
lead  to  the  design  of  NAX.  Special  attention  is  paid  to  the 
description  of  the  hardware  and  software  techniques  used, 
as  well  as  to  the  design  of  a  scalable  GBML  algorithm. 

4.1  Related  Background 

Bernado,  Llora  &  Garrell  |6  presented  a  first  empir- 
ical  comparison  between  genetics-based  machine  learning 
techniques  (GBML)  and  traditional  machine  learning  ap¬ 
proached.  The  authors  reported  that  GBML  techniques 
were  able  to  perform  as  well  as  traditional  techniques.  Later 
on,  Bacardit  &  Butz  3]  repeated  the  analysis  again  obtain¬ 
ing  similar  results.  Most  of  the  experiments  presented  on 
both  papers  were  conducted  using  publicly  available  data 
sets  provided  by  the  University  of  California  at  Irvine  repos¬ 
itory  28].  Most  of  the  data  sets  are  defined  over  tens  of 
features  and  up  to  few  thousands  of  records.  However,  a 
key  property  of  GBML  approaches  is  its  intrinsic  massive 
parallelism  and  scalability  properties.  Cantu-Paz  [8]  pre¬ 
sented  how  efficient  and  accurate  genetics  algorithms  could 
be  assembled,  and  Llora  [21]  presented  how  such  algorithms 
can  be  efficiently  used  as  machine  learning  and  data  mining 
techniques. 

GBML  techniques  require  evaluating  candidate  solutions 
against  the  original  data  set  matching  the  candidate  solu¬ 
tions  (e.g.  rules,  decision  trees,  prototypes)  against  all  the 
instances  in  the  data  set.  Regardless  of  the  GBML  flavor 
used,  Llora  &  Sastry  [25]  showed  that  as  the  problem  grows, 
the  matching  process  governs  the  execution  time.  For  small 
data  sets  (teens  of  attributes  and  few  thousands  of  records) 


the  matching  process  takes  more  than  85%  of  the  overall 
execution  time  marginalizing  the  contribution  of  the  other 
genetic  operators.  This  number  easily  passes  99%  when  we 
move  to  data  sets  with  few  hundreds  of  attributes  and  few 
hundred  thousands  of  records.  Such  results  emphasize  one 
unique  facet  of  GBML  approaches:  scalability  via  exploiting 
massive  parallelism.  More  than  99%  of  the  time  required  is 
spent  on  evaluated  candidate  solutions.  Each  solution  evalu¬ 
ation  is  independent  of  each  other  and,  hence,  it  can  be  com¬ 
puted  in  parallel.  Moreover,  the  evaluation  process  can  also 
be  parallelized  further  on  large  data  sets  by  splitting  and 
distributing  the  data  across  the  computational  resources. 
A  detailed  description  of  the  parallelization  alternatives  of 
GBML  techniques  can  be  found  elsewhere  [21  . 

Currently  available  off-the-shelf  GBML  methods  and  soft¬ 
ware  distributions  [5]  20  do  not  usually  target  dealing 
with  very  large  data  sets.  Three  different  works  need  to 
be  mentioned  here.  Flockhart  [l2]  proposed  and  imple¬ 
mented  GA-MINER,  one  of  the  earliest  effort  to  create  data 
mining  systems  based  on  GBML  systems  that  scale  across 
symmetric  multi-processors  and  massively  parallel  multi¬ 
processors.  The  work  review  different  encoding  and  par¬ 
allelization  schemes  and  conducted  proper  scalability  stud¬ 
ies.  Llora  [21]  explored  how  fine-grained  parallel  genetic 
algorithms  could  become  efficient  models  for  data  mining. 
Theoretical  analysis  of  performance  and  scalability  were  de¬ 
veloped  and  validated  with  proper  simulations.  Recently, 
Llora  &  Sastry  [25]  explored  how  current  hardware  can  be 
efficiently  used  to  speed  up  the  required  matching  of  so¬ 
lutions  against  the  data  set.  These  three  approaches  are 
the  basis  of  the  incremental  rule  learning  proposed  in  the 
next  section  to  approach  very  large  data  sets — such  as  the 
prostate  tissue  classification  one. 

4.2  The  Road  to  Tractability 

NAX  evolves,  one  at  a  time,  maximally  general  and  max¬ 
imally  accurate  rules.  Then,  the  covered  instance  are  re¬ 
moved  and  another  rule  is  added  to  the  previously  stored 
one,  forming  a  decision  list.  This  process  continues  until 
no  uncovered  instances  are  left.  Llora,  Sastry  &  Goldberg 
26i  showed  that  maximally  general  and  maximally  accu¬ 
rate  rules  [32]  could  also  be  evolved  using  Pittsburgh- style 
learning  classifier  systems.  Later,  Llora,  Sastry  &  Goldberg 
[27]  showed  that  competent  genetic  algorithms  [l5  evolve 
such  rules  quickly,  reliably,  and  accurately.  From  these  early 
works,  it  can  be  inferred  that  approaching  real-world  prob¬ 
lems,  such  as  the  prostate  tissue  classification  and  cancer 
diagnosis,  using  GBML  techniques  may  produce  the  desired 
byproduct:  proper  scalability.  We  discuss  next  efficient  im¬ 
plementation  techniques  to  deal  with  very  large  data  sets 
using  NAX  [24  . 

4.3  Exploiting  the  Hardware 

Recently,  multimedia  and  scientific  applications  have 
pushed  CPU  manufactures  to  include  support  for  vector 
instruction  sets  again  in  their  processors.  Both  applica¬ 
tions  areas  require  heavy  calculations  based  on  vector  arith¬ 
metic.  Simple  vector  operations  such  as  add  or  product  are 
repeated  over  and  over.  During  80s  and  90s  supercomput¬ 
ers,  such  as  Cray  machines,  were  able  to  issue  hardware 
instructions  that  took  care  of  basic  vector  operations.  A 
more  constrained  scheme,  however,  has  made  its  way  into 
general-purpose  processors  thanks  to  the  push  of  multime- 
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Figure  2:  This  figure  illustrates  the  parallel  model  implemented.  Each  processor  is  running  an  identical  NAX 
algorithm.  They  only  differ  in  the  portion  of  the  population  being  evaluated.  The  population  is  treated 
as  collection  of  chunks  where  each  processor  evaluates  its  own  assigned  chunk  sharing  the  fitness  of  these 
individuals  with  the  rest  of  processors.  This  approach  minimizes  communication  cost. 


dia  and  scientific  applications.  Main  chip  manufactures — 
IBM,  Intel,  and  AMD — have  introduced  vector  instruction 
sets — Altivec,  SSE3,  and  3DNow+ — that  allow  performing 
vector  operations  over  packs  of  128  bits  by  hardware.  We 
will  focus  on  a  subset  of  instructions  that  are  able  to  deal 
with  floating  point  vectors.  This  subset  of  instructions  to 
implemented  by  hardware  vector  operations  against  groups 
of  four  floating-point  numbers.  These  instructions  are  the 
basis  of  the  fast  rule  matching  mechanism  proposed. 

Our  set  of  rules  seek  both  to  correctly  classify  the  prostate 
data  set  and  provide  biological  insight  into  the  rules.  All  the 
attributes  of  the  domain  are  real- value  and  the  conditions  of 
the  rules  need  to  be  able  to  express  conditions  in  a  spaces. 
We  use  a  rule  encoding  similar  to  the  one  proposed  by  Wil¬ 
son  33]  and  widely  used  in  the  GBML  community.  Rules 
express  the  conjunction  of  tests  across  attributes.  Each  test 
can  be  defined  in  multiple  fashions,  but  without  loss  of  gen¬ 
erality,  we  pick  a  simple  interval  based  one.  A  simple  exam¬ 
ple  of  and  if-then  rule,  could  be  expressed  as  follows: 

1.0  <  a0  <  2.3  A  •  •  •  A  10.0  <  an  <  23  ->  a  (1) 

Where  the  condition  is  the  conjunction  of  the  different  at¬ 
tribute  tests,  as  introduced  earlier,  and  the  condition  is  the 
predicting  class.  We  also  allow  a  special  condition — don’t 
care — which  always  returns  true  to  allow  generalized  to 
rules  evolve.  The  rule  below  illustrates  an  example  of  a 
generalized  rule. 

1.0  <  a0  <  2.3  A  -3.0  <  a3  <  2  — >  a  (2) 

All  attributes  except  ao  and  a 3  were  marked  as  don’t  care. 

Matching  a  rule  requires  performing  the  individual  tests 
before  the  final  and  condition  can  be  computed.  Vector 
instruction  sets  can  help  improve  the  performance  of  this 
process  by  performing  four  tests  at  once.  Actually,  this  pro¬ 
cess  can  be  regarded  as  four  parallel  running  pipelines.  The 


process  can  be  improved  further  by  stopping  the  matching 
process  when  any  one  test  fails.  The  code  implemented  as¬ 
sumes  that  the  two  vectors  containing  the  upper  and  lower 
bounds  are  provided  and  records  are  stored  in  a  two  dimen¬ 
sional  matrix.  As  also  shown  elsewhere  |25|,  exploiting  the 
hardware  available  can  speed  between  3  and  3.5  times  the 
matching  process  [24]. 

4.4  Massive  Parallelism 

Since  most  of  the  time  is  spent  on  the  evaluation  of  candi¬ 
date  rules  when  dealing  with  large  data  sets,  our  next  goal 
was  to  find  a  parallelization  model  that  could  take  advantage 
of  this  feature.  Due  to  the  embarrassing  parallelism  model 
[1 7  for  rule  evaluation,  we  designed  a  coarse-grain  parallel 
model  for  distributing  the  evaluation  load.  Cantu-Paz  [8] 
proposed  several  schemes,  showing  the  importance  of  the 
trade  off  between  computation  time  and  time  spent  commu¬ 
nicating.  When  designing  the  parallel  model,  we  focused  on 
minimizing  the  communication  cost.  Usually,  a  feasible  so¬ 
lution  could  be  a  master/slave  one — the  computation  time  is 
much  larger  than  the  communication  one.  However,  GBML 
approaches  tend  to  use  rather  large  populations,  forcing  us 
to  send  rules  to  the  evaluation  slaves  and  collect  the  resulting 
fitness.  This  scheme  also  increments  sequential  instructions 
that  cannot  be  parallelized,  reducing  the  overall  speedup  of 
the  parallel  implementation  as  a  result  of  Ambdhals  law  T] . 

To  minimize  communication  cost,  each  processor  runs 
identical  NAX  algorithms — all  seeded  in  the  same  manner, 
and,  hence  performing  the  same  genetic  operations.  They 
only  differ  in  the  portion  of  the  population  being  evaluated. 
Thus,  the  population  is  treated  as  collection  of  chunks  where 
each  processor  evaluates  its  own  assigned  chunk,  sharing  the 
fitness  of  the  individuals  in  its  chunk  with  the  rest  of  proces¬ 
sors.  in  this  manner  fitness  can  be  encapsulated  and  broad¬ 
casted,  maximizing  the  occupation  of  the  underlying  pack- 
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(a)  Original  labeled  array  (b)  Automatically  classified  array 


Figure  3:  This  figure  on  the  left-hand  side  presents  the  original  labeled  data  contained  in  the  P80  array.  The 
figure  on  the  right-hand  side  presents  the  reconstructed  image  based  on  the  predictions  issued  by  the  the 
rule  set  evolved  by  NAX.  Green  represent  non  cancerous  tissue  spots;  red  represent  malignant  tissue  spots. 


ing  frames  used  by  the  network  infrastructure.  Moreover, 
this  approach  also  removes  the  need  for  sending  the  actual 
rules  back  and  forth  between  processors — as  a  master/slave 
approach  would  require — thus,  maintaining  the  communi¬ 
cation  to  the  bare  minimum — namely,  the  fitness.  Figure  2 
presents  a  conceptual  scheme  of  the  parallel  architecture  of 
NAX. 

To  implement  the  model  presented  in  Figure  2,  we  used 
C  and  the  open  message  passing  interface  (openMPI)  imple¬ 
mentation  [13] .  Each  processor  computes  which  individuals 
are  assigned  to  it.  Then  it  computes  the  fitness  and,  finally, 
it  broadcasts  the  computed  fitness.  The  rest  of  the  process 
is  unchanged.  Except  for  the  cooperative  evaluation,  all  the 
processors  generate  the  same  evolutionary  trace. 


4.5  Lists  of  Maximally  General  and 
Maximally  Accurate  Rules 

One  main  characteristic  of  the  so-called  Pittsburgh-style 
learning  classifier  systems — a  particular  type  of  GBML — is 
that  the  individuals  encode  a  rule  set  14j  22  15  .  Thus 
evolutionary  mechanisms  directly  recombine  one  rule  set 
against  another  one.  For  classification  tasks  of  moderate 
complexity,  the  rule  sets  are  not  large.  For  complex  prob¬ 
lems,  however,  the  potential  number  of  rules  required  to 
ensure  accurate  classification  may  use  prohibitively  large 
amounts  of  memory.  The  requirements  increase  even  fur¬ 
ther  in  the  presence  of  noise  [23  .  Hence,  this  family  of 
GBML  techniques  works  very  well  on  moderate  complexity 
problems  [6j  [3] ,  but  needs  to  be  modified  for  complex  and 
large  data  sets. 

A  sequential  rule  learning  approach  may  alleviate  the  re¬ 


quirements  by  evolving  only  one  rule  at  a  time,  hence,  reduc¬ 
ing  the  memory  requirements  [9  ,[2].  This  allows  maintaining 
relatively  small  memory  footprints  that  makes  feasible  pro¬ 
cessing  large  data  sets.  However,  an  incremental  approach 
to  the  construction  of  the  rule  set  requires  paying  special 
attention  to  the  way  rules  are  evolved.  For  each  run  of  the 
genetic  algorithm,  we  would  like  to  obtain  a  maximally  gen¬ 
eral  and  maximally  accurate  rule,  that  is,  a  rule  that  covers 
the  maximum  number  of  examples  without  making  mistakes 
[32|.  NAX  (our  proposed  incremental  rule  learner)  evolves 
maximally  general  and  maximally  accurate  rules  by  com¬ 
puting  the  accuracy  (a)  and  the  error  (e)  of  a  rule  [26] .  In  a 
Pittsburgh-style  classifier,  the  accuracy  may  be  computed  as 
the  proportion  of  overall  examples  correctly  classified,  and 
the  error  is  the  proportion  of  incorrect  classifications  issued. 
Once  the  accuracy  and  error  of  a  rule  are  known,  the  fitness 
can  be  computed  as  follows. 

f{r)  =  a(r )  ■  e(r)7  (3) 

where  7  is  the  error  penalization  coefficient.  We  have  set  7 
to  18  to  guarantee  that  the  evolutionary  process  will  pro¬ 
duce  maximally  general  and  maximally  accurate  solutions. 
Further  details  may  be  found  elsewhere  [24].  The  above 
fitness  measure  favors  rules  with  a  good  classification  accu¬ 
racy  and  a  low  error,  or  maximally  general  and  maximally 
accurate  rules.  By  increasing  7,  we  can  bias  the  search  to¬ 
wards  correct  rules.  This  is  an  important  element  because 
assembling  a  rule  set  based  on  accurate  rules  guarantees  the 
overall  performance  of  the  assembled  rule  set.  NAX’s  efficient 
implementation  of  the  evolutionary  process  is  based  on  the 
techniques  described  using  hardware  acceleration — section 
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|4.3] — and  coarse-grain  parallelism — section  4.4.  The  genetic 
algorithm  used  was  a  modified  version  of  the  simple  genetic 
algorithm  [l4]  using  tournament  selection  (s  =  4),  one  point 
crossover,  and  mutation  based  on  generating  new  random 
boundary  elements. 

5.  RESULTS 

NAX  has  shown  competitiveness  in  evolving  rule  sets  that 
perform  as  accurately  as  the  ones  evolved  by  other  genetics- 
based  machine  learning  and  non-evolutionary  machine  learn¬ 
ing  techniques.  However,  NAXs  key  element  is  the  ability  to 
deal  with  large  data  sets.  In  this  paper,  we  present  prelim¬ 
inary  results  towards  evolving  a  model  capable  of  correctly 
classifying  pixels  as  cancerous  or  non-cancerous.  The  origi¬ 
nal  array  of  spots  is  presented  in  figure  |3  (a)  |  Each  spot  cor¬ 
responds  to  a  different  biopsy  sample  from  a  patient.  The 
pixels  present  in  each  spot  correspond  to  the  epithelial  tis¬ 
sue  of  the  biopsy,  we  supress  all  other  tissue  types  with 
a  prior  classification  filter  based  on  Bayesian  Likelihood.  1 7 
Each  pixel  of  a  spot  is  defined  by  93  different  metrics  ex¬ 
tracted  from  the  processed  infrared  spectra — as  described 
in  section  [3]  Finally,  each  pixel  in  the  array  was  labeled 
with  the  diagnostic  class  provided  by  a  human  pathologist. 
Figure  |3  (a)  |  presents  in  green  all  the  non-cancerous  pixels 
while  red  identifies  cancerous  ones. 

Our  goal  with  the  initial  experiments  here  was  to  demon¬ 
strate  the  usefulness  of  the  proposed  approach  to  computer- 
aided  diagnosis.  Our  current  experimental  efforts  are  plan¬ 
ning  mass  experimentation  on  several  tissue  arrays  using  the 
Tungsten  cluster  at  the  National  Center  for  Supercomput¬ 
ing  Applications.  These  initial  experiments  were  conducted 
on  a  dual  core  Intel  Xeon  2.8GHz  Linux  computer  with  1Gb 
of  RAM.  NAX  was  run  using  both  processors.  The  training 
time  to  obtain  a  model  describing  all  the  data  took  less  than 
ten  hours — indicating  that  very  competitive  training  times 
can  be  achieved  by  just  using  more  processors.  The  ob¬ 
tained  model  was  able  to  correctly  classify  >  99.99%  of  the 
training  pixels  correctly.  However,  these  results  do  not  illus¬ 
trate  the  generalization  capabilities  of  the  models  evolved 
by  NAX.  Hence,  we  ran  a  series  of  ten-fold  stratified  cross- 
validation  runs  34]  to  measure  generalization  and  test  per¬ 
formance  of  the  evolved  models.  It  is  important  to  mention 
that  tools  such  as  WEKA  [34]  and  other  off-the-shelf  data 
miners  were  not  able  to  handle  the  volume  of  data  required 
to  evolve  a  model —  either  due  to  the  large  memory  foot¬ 
print  required  or  by  not  being  able  to  provide  an  accurate 
model  in  a  feasible  time  period.  The  results  of  the  cross- 
validation  experiments  using  NAX  correctly  classified  87.34% 
of  validation  pixels.  Such  results  are  more  than  encouraging, 
because  they  show  a  human-competitive  computer-aided  di¬ 
agnosis  system  is  possible.  Another  interesting  property  is 
that  a  few  rules  classify  a  large  number  of  pixels — see  Fig¬ 
ure  [4]  Such  a  result  is  interesting  for  the  interpretability 
of  the  model,  since  a  small  number  of  rules  have  a  great 
expressiveness,  and  hence  may  provide  valuable  biological 
insight.  Most  importantly,  they  allow  us  to  classify  tissue 
accurately.  Subsequent  to  this  pixel  level  classification,  each 
circular  spot  in  figure  3  was  assigned  as  malignant  or  benign 
based  on  the  majority  of  pixels  of  he  class  in  the  sample.  We 
were  able  to  accurately  classify  68  of  69  malignant  spots  and 
70  of  71  benign  spots  in  this  manner.  While  human  accu¬ 
racy  is  difficult  to  quantify  due  to  the  variation  between 
persons, a  generally  accepted  anecdotal  figure  is  about  5% 


Figure  4:  Performance  of  the  evolved  model  as  a 
function  of  the  number  of  rules  used. 

error  rates.  The  preliminary  results  we  demonstrate  here 
could  potentially  reduce  that  five- fold  to  about  1%,  provid¬ 
ing  a  solution  to  this  real-world  problem  by  a  combination 
of  novel  spectroscopy  and  advanced  machine  learning. 

6.  CONCLUSION 

In  this  manuscript,  we  present  the  application  of  advanced 
genetics-based  machine  learning  algorithms  to  a  real-world 
problem  of  large  scope,  namely,  the  diagnosis  of  prostate 
cancer.  As  opposed  to  subjective  human  recognition  of  dis¬ 
ease  in  tissue  using  light  microscopy,  we  employed  a  chemical 
microscopy  approach  that  required  extensive  computation 
but  provided  a  decision  without  human  input.  Our  devel¬ 
opment  of  a  learning  algorithm  based  on  maximally  general 
and  maximally  accurate  rules  was  scalable  to  very  large  data 
sets  and  parallelized  to  provide  learning  and  classification 
speed  advantages.  The  algorithm  was  able  to  classify  a  ma¬ 
jority  of  pixels  correctly,  resulting  in  overall  error  rates  that 
were  comparable  to  human  examination,  the  current  gold 
standard  of  care. 
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INTRODUCTION 

The  integration  of  FTIR  spectroscopy  with  m  icroscopy  facilitates  recording  of  spatially  resolved 
spectral  information,  allowing  the  examination  of  both  the  structure  and  chemical  composition  of 
a  heterogeneous  m  aterial.  W  hile  the  fi  rst  such  attem  pt  was  over  50  years  ago,  1  pr  esent  da  y 
instrumentation  largely  evolved  from  the  point  microscopy  detection  of  interferometric  signals 
that  developed  in  the  m  id-80s.2  The  successful  coupling  of  interf  erometry  for  spectral  recording 
and  m  icroscopy  for  spatial  specificity  in  these  sy  stems  spurred  interest  in  a  variety  of  fields, 
including  the  m  aterials,3  forensic  4  and  biom  edical  arenas  ,5‘ 6  Point  m  icroscopy  utilizes  an 
aperture  to  restrict  radiation  incident  on  a  sample  and  permits  the  recording  of  spatially  localized 
data.  The  prim  ary  utilities  of  this  form  of  mi  croscopy  lay  in  acquiring  accurate  spectra  from 
small-size  s  amples,  in  determ  ining  the  ch  emical  s  tructure  and  com  position  of  he  terogeneous 
phases  at  specified  points  and  in  building  a  two-dimensional  map  of  the  chemical  composition  of 
samples.  Since  th  e  data  were  acq  uired  at  a  single  point,  com  position  m  aps  could  only  be 
acquired  by  rastering  the  sam  pie.  Hence,  the  ap  proach  was  termed  mapping  or  point  m  apping 
and  involved  as  many  spectral  scans  as  the  number  of  pixels  in  the  map. 

The  use  of  focal  plane  array  (FPA)  detecto  rs  for  m  icroscopy  ’  allowed  for  the  acquisition  of 
large  f  ields  of  view  in  a  sing  le  interf  erogram  acquis  ition  sweep.  The  m  ultichannel  de  tection 
enabled  by  array  d  etectors  was  sim  ilar  to  the  c  oncept  of  recording  im  ages  with  ch  arge  coupled 
devices  in  optical  microscopy;  hence,  the  approach  was  termed  imaging.  The  unique  advantages 
of  observing  an  entir  e  field  of  view  rapidly  p  emitted  applications  that  allowed  monitoring  of 
dynamic  processes,  spatially  re  solved  spectroscopy  of  large  samples  or  m  any  sam  pies  and 
enhancement  of  spatial  resolution  due  to  retention  of  radiation  throughput  that  was  lost  in  point 
microscopy  system  s  due  to  diffraction  at  the  aper  ture.  Just  as  for  the  previous  generation  of 
microspectroscopy  instruments,  applications  rapidly  followed  in  the  materials  9  and  biomedical 
fields.10'14  Research  a  ctivity  in  this  ar  ea  c  an  be  div  ided  into  three  m  ajorcatego  ries: 
instrumentation  and  sampling  methodologies,  applications  and  data  extraction  algorithms.  In  this 
manuscript,  we  review  key  advances  and  recent  developm  ents  in  the  context  of  biom  edical 
imaging.  We  do  not  provide  comprehensive  overview  but  selectively  highlight  certain  features  of 
importance  for  cancer-related  imaging.  Last,  we  focus  on  one  emerging  application  area,  namely 
tissue  h  istopathology,  and  provide  illus  trative  ex  amples  from  our  laboratory  in  dicating  the 
integrative  nature  of  the  three  in  developing  protocols. 
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INSTRUMENTATION.  SAMPLING  AND  DATA  HANDLING  TECHNIQUES 
Instrumentation 

Since  im  aging  is  largely  based  on  new  detector  s  with  unique  performance  characteristics  for 
spectroscopy,  efforts  in  instrumentation  have  largely  focused  on  the  efficient  integration  of  FPA 
detectors  with  inte  rferometers.  D  ue  to  the  size,  d  ifferent  electro  nics  and  unique  noise 
characteristics  of  FPAs,  an  optim  ization  of  data  acquisition  methodology  was  a  primary  activity 
in  th  e  initial  tim  e  period  of  av  ailability  of  instrum  entation.  The  first  rational  attem  pt  at 
understanding  performance  and  optimizing  the  data  acquisition  process  revealed  the  unique  noise 
characteristics  that  limited  the  first  generation  of  array  detectors.15  Briefly,  this  paper  established 
that  the  general  behavior  of  FTIR  spectrom  eters  is  generally  held  for  im  aging  spectrometers  but 
the  detector  may  serve  to  lim  it  the  applicability  of  established  practices  in  IR  spectrom  etry.  An 
explicit  op  timization  of  the  data  acquisition  time  revealed  severa  1  strategies  for  speeding  data 
collection  for  both  the  step  s  can  and  rapid  scan  mode.16  The  first  exam  pie  of  rap  id-scan  FTIR 
imaging  was  conducted  using  asynchronous  sam  pling,  followed  by  descriptions  of 
synchronously  triggered  sampling  and  generalized  methodologies18  that  could  use  any  detector  at 
any  modulation  frequency  using  po  st-acquisition  t  echniques.  Advances  in  detector  techno  logy 
have  now  allowed  for  rapid  scan  im  aging  to  b  ecome  routine  for  larg  e  FPA  dete  ctors,  while 
innovative  new  detectors  have  be  en  developed  (first  by  P  erkinElmer)  that  trade  off  a  large 
multichannel  dete  ction  advantage  o  f  arrays  ag  ainst  the  sp  eed  of  sm  aller  dete  ctor  array  s  to 
provide  a  very  high  performance  instrument.19 

At  present,  rapid  scan  im  aging  has  becom  e  th  e  m  ode  of  choice  for  most  m  anufacturers  and 
detector  sizes  have  proliferated  from  the  classic  64  x  64  format  to  range  from  16  x  1  to  256  x  256 
formats  (see  figure  1).  While  the  sm  aller  detectors  require  rastering  to  im  age  most  samples  and 
can  provide  data  of  higher  quality  more  efficien  tly,  larger  detectors  are  generally  employed  for 
their  large  field  of  view  and  are  useful  for  st  udying  dynamics.  It  is  intere  sting  to  note  that  the 
linear  array  approach  has  an  entirely  different  detector  technology  and  considerations  for 
electronics  compared  to  the  two-dimensional  FPAs.  While  it  is  beyond  the  scope  of  this  article  to 
discuss  the  differences,  the  use  of  “macro”  electronics  that  are  offset  from  the  actual  detector  and 
AC  mode  of  operation  are  the  two  major  differences  that  affect  data.  Consequently,  comparisons 
in  performance  are  slightly  more  complicated.  On  the  large  format  FPA  front,  the  latest  advance 
seems  to  be  a  detector  developed  jointly  by  NIH  and  FBI  personnel  in  2005.  The  detector  can 
operate  at  16  KHz  for  128  x  128  pixel  snaps  (  Bhargava,  Levin,  Perlman  and  Bartick, 

Unpublished).  This  is  in  the  speed  regim  e  of  single  elem  ent  detectors.  Hence,  the  developm  ent 
can  truly  lead  to  the  acquisition  of  an  entire  im  age  in  a  single  interferometer  mirror  sweep  in  the 
same  time  that  it  tak  es  to  acquir  e  1  spectrum  with  a  benchtop  IR  spectrom  eter.  To  handle  the 
large  data  output,  we  designed  on-chip  co-add  ition  and  various  correcti  ons.  W  e  believe  that 
similar  detector  system  s,  operating  in  a  fast  re  gime  and  integrating  proc  essing  with  electronics, 
are  likely  to  be  the  technology  of  tomorrow  for  FTIR  imaging. 

The  wide  variety  of  instrumentation  makes  comparisons  difficult,  especially  when  manufacturers 
provide  different  specifications  for  instruments.  We  have  proposed  a  comparison  index  for  these 
systems  based  on  perform  ance  per  unit  tim  e.  R  ecognizing  that  spectra  1  resolution,  tim  e  for 
scanning,  data  processing  (e.g.  apodization)  and  resultant  im  age  size  are  the  prim  ary 
determinants  of  performance,  a  measure  can  be  formulated  to  describe  performance.  For  a  fixed 
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data  processing  scheme  (filtering,  apodization  etc.),  th  e  time  taken  to  acquire  1  m  egapixels  of 
data  for  8  cm  resolution  at  a  signal  to  noise  rati  o  (SNR)  of  1000: 1  is  found  to  be  a  good 
measure.  We  would  lik  e  to  emphasize  that  th  e  performance  is  the  perf  ormance  of  the  en  tire 
imaging  spectrometer  and  not  due  to  the  detector  alone.  Efficient  coupling  of  the  interferom  eter 
and  optimization  of  the  optical  train  will  both  affect  performance  as  will  the  correct  setup  of  the 
experiment.  This  index  also  does  not  consider  the  ease  of  use  or  “user-friendliness”  of  systems. 
These  are  other  important  considerations  and  must  also  be  considered  by  organizations  interested 
in  FTIR  imaging  technology.  The  issue  of  time  resolution  for  acquiring  data  is  one  such  concern. 
The  first  approach  is  the  kinetics  approach  in  which  the  interferometer  is  repeatedly  scanned  and 
imaging  data  sets  are  se  quentially  acquired  as  quickly  as  possible.  Clearly,  rapid  scan  is  favored 
and  the  availability  of  fast  readout  detectors  is  m  andatory  for  fast  e  vents.  The  lim  it  to  this 
method  is  t  he  readout  speed  of  the  array  (fram  es  in  m  s)  as  interferom  eters  can  generally  be 
scanned  fast  enough  and  the  integr  ation  time  required  is  typically  in  the  tens  of  m  icroseconds 
regime.  An  exam  pie  is  shown  in  figure  2  to  dem  onstrate  applicability  in  m  onitoring 

polymerization  kinetics. 

Though  rapid  scan  im  aging  has  disp  laced  th  e  step  -scan  m  ode  in  m  ost  new  instrum  entation,  a 
very  im  portant  application  of  the  step-s  can  approach  rem  ains  in  tim  e-resolved  im  aging.20"22 
Briefly,  the  method  is  applicable  to  systems  that  can  be  repeatedly  and  reproducibly  excited  and 
relax  back  to  their  grou  nd  state.  At  each  m  irror  retardation,  the  FPA  is  repeated  ly  triggered  to 
acquire  data.  At  the  sam  e  time,  the  sample  is  excited  once  and  the  dyna  mics  of  excitation  and 
decay  of  the  excited  state  are  m  onitored.  Mirror  stepping,  data  acquisition  and  sample  excitation 
are  all  precisely  synchronized.  Figure  3  dem  onstrates  the  synchronization.  Time  resolved  FTIR 
imaging  was  first  demonstrated  using  polymer- liquid  crystal  composites.  Examples  of  the  types 
of  data  that  m  ay  be  obta  ined  are  also  shown  in  figure  3.  Last,  the  technology  was  extended  to 
provide  significantly  higher  tim  e  resolution  than  could  be  obtained  by  the  electronics  of  the 
detector  alone. 23  W  hile  FPA  detectors  are  slo  w  com  pared  to  single  point  detectors  used  in 
conventional  FTIR  spectroscopy,  the  cause  is  the  need  to  read  out  data  from  several  thousand 
pixels  and  not  from  the  need  to  r  ecord  d  ata  f  rom  all  p  ixels.  Hence,  by  staggering  the  data 
recording  time  over  m  ultiple  sample  excitations,  higher  te  mporal  reso lution  may  be  obtain  ed. 
With  current  detectors,  a  time  resolution  of  ~30  ps  should  be  possible. 

Sampling 

Interferometer  Issues 

Among  the  sampling  configurations,  the  first  clearly  was  the  optim  ization  of  the  microscope  for 
transmission  and  sampling.  Unexpected  issues  were  encountered  in  initial  devices.  For  exam  pie, 
the  detector  for  the  mono-wavelength  laser  provides  a  fringe  pattern  to  allow  for  tracking  mirror 
retardation.  The  signal  from  this  laser  is  measured  by  a  small  detector  located  at  the  center  of  the 
beamsplitter  (to  minimize  errors)  with  an  arm  that  extends  out  to  the  edge.  When  imaged  onto 
the  FPA,  this  laser  detector  leads  to  a  pattern  with  low  s  ignal  levels.  Hence,  the  field  of  view  is 
not  uniform,  leading  in  turn,  to  lower  signal  to  noise  ratios  (SNR)  for  the  affected  region.  Many 
manufacturers,  hence,  have  re-designed  th  eir  spectrom  eters  for  im  aging  us  e.  Another 
manufacturer  has  avoided  this  issue  by  aligning  their  microscope  to  sample  only  the  unaffected 
part  of  the  beam  .  Since  the  non-im  aging  sp  ectrometer  did  not  require  im  aging  and  the 
interferometer  was  simply  coupled  to  a  microscope,  these  issues  were  slowly  addressed. 
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Sampling  Modes:  Transmission,  Transmission-reflection,  Reflection  and  Attenuated  Total 
Reflection 

A  vast  m  ajority  of  studies  re  port  the  use  of  transm  ission  sampling.  Other  major  developments 
have  been  the  incorporation  of  reflective  slides,  24, 25 ’ 26  the  integration  of  ATR  elements  for  both 
microscopy  and  large  sam  pie  im  aging,  integra  tion  of  A  TR  technology  with  various  sam  pie 
forming  accessories,  gra  zing  angle  a  ccessories  and  multi-sample  accessories.  Reflective  slides 
actually  result  in  reflection-absorp  tion  that  allows  the  beam  to  sample  the  signal  twice,  though 
with  a  different  phase  and  lower  signal  due  to  half  the  objective  being  used  for  transmitting  light 
to  the  sam  pie  and  the  other  ha  If  being  used  to  acquire  light  fr  om  it.  A  detailed  theoretical 
understanding  of  the  confounding  e  ffects  has  not  been  published,  though  an  example  of  the 
possible  data  correction  algorithm  has  been  repo  rted.  ATR  imaging  is  also  highly  prevalent  and 
available  as  attachments  to  conventional  imaging  microscopes,  using  the  sample  chamber  of  the 
spectrometer  and  using  it  as  a  solid  immersion  lens.27  We  discuss  examples  of  ATR  imaging  next. 

ATR 

In  the  Attenuated  Total  Reflection  (ATR)  mode,  an  IR  transmitting  crystal  of  precise  geometry 
of  high  refractive  index  is  em  ployed  as  a  solid  im  mersion  lens.  Light  is  totally  reflected  at  the 
sample-crystal  in  terface  and  an  ev  anescent  fiel  d  pene  trates  into  the  sam  pie  to  provide  th  e 
interaction  to  be  observed  usi  ng  th  e  tr  aveling  wave.  Since  the  s  ample  inte  raction  is  large  ly 
determined  by  the  lens  and  not  by  the  sam  pie,  pr  ecise  and  controlled  de  pth  of  interaction  is 
available.  The  sample,  however,  needs  to  be  in  good  contact  to  allow  effi  cient  coupling  with  the 
evanescent  wave.  ATR  im  aging  allows  users  to  work  with  relatively  thick  sam  pie  sections  that 
do  not  require  m  uch  sample  preparation  expertis  e  or  tim  e.  The  first  use  of  ATR  i  maging  was 
reported  by  Digilab  in  analyzing  large  samples  that  were  not  sectioned,  as  for  transmission.  ATR 
imaging  microscopy  was  demonstrated  soon  after,28  followed  by  other  novel  accessories.  There 
were  other  unpublished  attem  pts  that  one  of  th  e  authors  is  aware  of:  In  1999,  f  or  exam  pie, 
Snively  et  al.  (personal  communication,  unpublished)  de  monstrated  im  aging  data  from  a  n 
inverted  ZnSe  prism  acting  as  a  single  bounce  AT  R.  Soon  after,  we  employed  a  Ge  crystal  but 
found  the  signal  to  nois  e  ratio  of  the  im  aging  system  of  that  time  to  be  very  poor.  In  addition  to 
the  ease  of  sam  pie  preparation,  another  m  ajor  advantage  of  ATR  i  maging  lies  in  improving  the 
limited  spatial  resolution  of  transmission  microscopy.29  The  authors  assessed  that  they  were  able 
to  achieve  a  spatial  resolution  of  1pm  with  a  Ge  internal  reflection  element 

Both  micro  and  macro  sampling  has  been  extensively  utilized.30  A  spatial  reso  lution  of  3-4  pm 
using  a  Ge  ATR  element  was  claimed  based  on  more  stringent  criteria  than  used  previously.29  Ge, 
ZnSe  and  diam  ond30  crysta  Is  hav  e  been  the  m  aterials  of  choice  for  m  ost  applications.  In 
particular,  Kazarian  and  co-workers  have  extensively  employed  ATR-FTIR  imaging  for  various 
applications  including  drug  release;  polym  er/drug  formulations  and  biological  system  s.30'33  The 
same  group  has  provided  other  innovative  sampling  configurations  f  or  specific  experim  ents, 
including  a  compaction  cell  that  allows  compaction  of  a  tablet  directly  on  a  diamond  crystal  with 
a  subsequent  imaging.34  The  changes  in  the  distribution  of  a  tablet  consisting  of  hydroxypropyl 
methylcellulose  (HPMC)  and  caffeine  upon  contact  with  water  were  studied  .  In  this  m  anner, 
conventional  dissolution  m  easurements  were  combin  ed  with  a  concurrent  assessm  ent  of  the 
compacted  tablet  structure.35  As  opposed  to  the  organic  solvent-polymer  dissolution  experiments 
reported  ea  rlier,  th  is  co  nfiguration  allows  for  easy  handling  and  im  aging  of  water-induced 
dissolution.  The  setup  can  also  provide  high  throughput  analysis  of  m  aterials  under  controlled 
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environments.  Microdroplet  sample  deposition  system  was  combined  with  a  hum  idity  control 
device  to  image  about  1  00  samples  deposited  on  the  surface  of  an  ATR  crystal  simultaneously. 
The  approach  was  extended  to  165  sam  pies  and  we  re  reported  to  study  para  llel  dissolution  of 
formulations.37 

Multi-sample  Accessories  and  Sampling 

While  imaging  the  structure  of  materials  has  been  the  primary  focus  of  FTIR  imaging,  a  number 
of  applications  utilize  the  imaging  of  multiple  samples.  The  first  examples  were  from  the  field  of 
catalyst  res  earch.38  Typically  2-12  sam  pies  could  be  im  aged  and  analyzed  under  the  s  ame 
conditions.  High  throughput  validation  or  m  ethod  development  was  the  prim  ary  goal  in  these 
studies.  Tissue  m  icroarrays  (TMAs)  provide  th  e  same  function  in  biom  edical  imaging.  TMAs 
consist  of  tens  to  hundreds  of  samples  arranged  on  a  grid  form  at.  This  allows  for  easy 

visualization  of  the  structure  and  classification  accuracy  across  m  any  patients  and  th  e  statistical 
measures  needed  for  rig  orous  validation.  The  p  rimary  utility  of  the  m  ultisample  image  in  this 
case  is  to  provide  wide-ranging  sam  pling  and  convenient  archiving  or  data  storage,  not 
necessarily  to  provide  a  higher  throughput.  14, 39  W  ith  the  appropriate  geom  etry,  many  sam  pies 
can  be  im  aged  to  understand  their  dynam  ics  in  a  concerted  fashion.  To  accommodate  the 
samples,  the  field  of  view  is  often  expanded.  This  results  in  a  lowe  r  spatial  resolution.  For 
imaging  multip  le  samples,  thoug  h,  the  spatial  reso  lution  can  be  conserved  but  tern  poral 
resolution  is  restricted. 

BIOMEDICAL  APPLICATIONS 
Bone 

Bone  has  been  the  tiss  ue  studied  most  by  FTIR  i  maging.  Bone  composition  changes  with 
development,  environment,  genetics,  health  and  disease,  is  amenable  to  imaging  at  the  resolution 
length  scale  of  i  maging  and  has  a  lim  ited  chemical  composition  that  is  characterized  using  IR 
spectroscopy.40  For  almost  30  ye  ars  until  the  late  1980s,  41  bone  structure  was  studied  using 
single  element  detectors  in  FTIR  spectrom  eters.  Typically,  ground  bone  was  analyzed  using  the 
conventional  KBr  pellet  m  ethod.  This  pellet  method  obviously  destroye  d  local  structures, 
precluding  an  understanding  of  molecular  variat  ions  due  to  disease.  Nevertheless,  it  was 
sensitive  to  chemical  composition  and  did  provide  useful  information.  With  microscopy  and  now 
with  FTIR  imaging,  sample  integrity  is  maintained  and  ability  to  acquire  spectral  infor  mation  at 
anatomically  discrete  sites  is  possible.  From  the  resulting  spectra,  se  veral  important  pieces  of 
information  can  be  obtained.  For  example,  a)  relative  mixture  composition  of  hydroxyapatite  and 
collagen  by  calculating  the  ratio  of  the  integ  rated  Vi,  V3  phosphate  and  amide  1  (mineral:  matrix 
ratio),  b)  carbonate  sub  stitution  by  calcu  lating  the  ratio  of  carbonate/p  hosphate  ratio  from  the 
ratio  of  integrated  V2  carbonate  peak  (850-900  cm'1)  and  vi,  V3  phosphate  contour  (900-1200  cm' 
'),  c)  crystallinity  of  the  mineral  phase  from  the  ratio  of  1030/1020  peak  intensity  42  These  assays 
illustrate  several  quantities  important  to  bone  res  earch  and  disease  diagnoses  that  can  be  read  ily 
performed.  Though  a  c  omplete  discussion  is  available  in  the  reference  40, 42'44,  we  pick  three 
illustrative  examples  demonstrating  the  applicability  in  disease  and  in  research. 

IR  spectral  analys  is  of  healthy  and  dis  ease  bo  ne  has  been  reviewed  b  y  Boskey  et  al.  with 
particular  emphasis  on  changes  in  bones  compos  ition,  ph  ysiochemical  status  of  m  ineral  and 
matrix  of  bones  during  osteoporosis  and  the  e  ffect  of  therapeutics  on  these  param  eters. 
Osteoporosis  or  porous  bone  is  a  bone  disease  ch  aracterized  by  low  bone  mass  and  structural 
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deterioration  of  bone  tissue.  This  leads  to  bon  e  fragility  and  an  increased  sus  ceptibility  to 
fractures,  es  pecially  at  the  hip,  s  pine  and  wris  t.  FTIR  im  ages  oft  he  m  ineral  content  and 
crystallinity  in  trabecular  bone  of  norm  al  and  osteoporotic  sam  pies  clearly  depicts  that  the 
trabeculae  in  diseased  tissue  are  thinner.  Moreover,  the  mineral/matrix  ratio  in  osteoporotic  bone 
is  significan  tly  reduced,  whereas  crystallin  ity  is  increased.  These  advances  dem  onstrate  the 
potential  and  applicability  of  the  technique  to  characterize  diseased  tissue.  Bone  mineral  changes 
between  a  healthy  m  ouse  model  and  Fabry  diseased  (lipid  s  torage  disease)  mouse  model  were 
also  an  alyzed  in  which  globotriaosyl  ceramide  (Gb3)  accu  mulates  in  tissues.  43  No  significant 
differences  in  the  bone  mineral  properties  were  observed  between  Fabry  and  healthy  mice,  which 
might  reflect  the  sim  ilar  lack  of  m  ajor  bone  phe  notype  in  hum  an  patients  with  Fabry’s  disease 
and  m  ay  also  be  related  to  the  developm  ental  age  of  the  se  anim  als.  The  study  provides  an 
example  of  the  applicability  to  laboratory  research. 

Calcified  tissue  in  biopsies  from  adults  with  osteomalica  has  been  studied.44  Osteomalacia  results 
in  a  deficiency  of  the  primary  mineralization  of  the  matrix,  leading  to  an  accumulation  of  osteoid 
tissue  and  reduction  in  bone’s  m  echanical  strength.  A  decrease  in  trabecu  lar  bone  content  with 
absence  of  changes  in  matrix  or  mineral  is  noticed  when  iliac  crest  biopsies  of  individuals  with 
vitamin  D  deficient  osteom  alacia  are  compared  to  normal  controls.  These  findings  support  the 
assumption  that,  in  osteomalacia,  the  quality  of  the  organic  matrix  and  of  mineral  in  the  centre  of 
the  bone  does  not  vary,  where  as  less-than  optimal  mineralization  occurs  at  the  bone  surface. 

Brain 

Monkey  brain  tissues  were  one  am  ong  the  firs  t  tissues  exam  ined  by  using  FTIR  im  aging.12 
Lately,  the  applications  have  experienced  a  rena  issance  with  app  lications  to  the  h  uman  brain . 
Grossly,  brain  can  be  divided  into  two  types  of  matter,  namely  gray  m  atter  and  white  m  atter. 
These  names  derive  simply  from  their  appearance  to  the  naked  eye.  Gray  matter  consists  of  cell 
bodies  of  nerve  cells  while  white  matter  consists  of  th  e  long  filaments  that  extend  from  the  cell 
bodies  -  th  e  "telephone  wires"  of  th  e  neuronal  network,  transmitting  the  electrical  signals  that 
carry  the  messages  between  neuro  ns.  A  visualization  of  the  two  com  partments  formed  the  first 
demonstrative  application  of  FTIR  microspectroscopic  imaging. 

FTIR  imaging  and  m  ultivariate  statistical  analyses  (unsupervised  hierarchical  cluster  analys is) 
were  applied  alongwith  histology  and  immunohistochem  istry  in  an  anim  al  m  odel  having 
Glioblastoma  m  ultiform  (GBM).  45  GBM  is  a  highly  m  alignant  human  brain  tumor  that  is 
considered  to  be  the  one  of  the  most  difficult  to  treat  effectively  46  Authors  were  able  to  identify 
the  tumor  growth  as  chemically  distinct  from  the  surrounding  brain  tissue.  The  distribution  of  the 
absorbance  of  am  ide  I  in  im  ages  highlighted  high  concentrations  of  proteins  in  the  corpus 
callosum  and  regions  of  basal  ganglia  for  healthy  brain.  Low  absorbance  was  generally  observed 
in  the  cortex,  whilst  a  h  igher  absorbance  was  observed  at  outer  layer  of  the  cortex.  For  a  GBM 
bearing  animal,  the  highest  absorbance  was  found  at  the  tumor  site.  In  contrast  to  healthy  brain,  a 
lower  absorbance  of  the  am  ide  I  band  was  observe  d  at  the  corpus  callo  sum  when  com  pared  to 
that  in  the  cortex  and  the  caudoputam  en.  The  study  demonstrates  a  powerful  application  of 
simple  analyses  that  can  indicate  disease.  It  also  highlights  the  multitude  of  spatial  and  spectral 
clues  that  can  be  use  to  diagnose  or  understand  the  disease. 
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In  addition  to  prim  ary  disease  sites,  diagnoses  me  tastatic  spread  from  various  cancers  was  also 
reported.47  A  m  ultivariate  cla  ssification  algo  rithm  wa  s  used  to  disting  uish  norm  al  tissue  f  rom 
brain  metastases  successfully  and  to  classify  the  primary  tumor  of  brain  m  etastases  from  renal 
cell  carcinom  a,  lung  cancer,  colorectal  cancer,  a  nd  breast  cancer.  In  the  cluster  averaged  IR 
spectra  fro  m  a  brain  m  etastasis  o  f  renal  cell  carcinoma,  the  m  ain  spectral  d  ifferences  were 
observed  for  the  three  tissue  regions  in  the  region  from  950  to  1200  cm'1  and  from  1500  to  1700 
cm'1.  Band  intensities  of  1026,  1080  and  1153  cm  1  are  at  m  aximum  in  the  sp  ectrum  of  black 
cluster  and  minimum  in  the  spectrum  of  light  gray  cluster.  The  comparisons  of  the  IR  spectra  of 
normal  brain  tissue  and  brain  m  etastases  of  lung,  breast  cancer  and  colorectal  cancer  were  made 
and  found  that  these  spectra  do  not  contain  spectral  features  at  1026,  1080  and  1153  cm'1  that  are 
indicative  of  the  presence  of  glycogen.  It  was  concluded  that  these  afore  mentioned  spectral 
features  would  be  cons  idered  as  a  b  iomarkers  for  brain  m  etastases  of  the  prim  ary  tumor  renal 
cell  carcinoma.  In  addition  to  these  three  bands,  the  spectral  differences  were  observed  for  the 
bands  at  1542  and  1655  cm'1,  owing  to  the  presence  of  amide  I  and  amide  II  vibrations.  It  is  clear 
from  the  results  that  th  e  m  aximum  protein  co  ncentrations  correlate  with  m  inimum  glycogen 
concentrations  in  the  IR  im  age.  However,  the  p  rotein  and  glycogen  properti  es  evident  in  the  IR 
image  are  not  visible  in  the  unstained  cryosec  tion.  It  is  noteworthy  th  at  sim  pie  univariate 
analyses  provide  the  end  clues  to  the  disease.  Even  on  application  of  multivariate  techniques,  the 
most  prominent  and  easy  to  understand  biom  arkers  of  disease  are  those  defined  by  conventional 
spectroscopic  knowledge  as  being  important  for  identification,  namely,  features  and  their 
absorption. 

In  the  clus  ter-averaged  IR  spect  ra  of  white  m  atter  from  the  three  norm  al  brain  tissu  e  samples, 
intense  bands  at  1060,  1233,  1466,  1735,  2850  and  2920  cm  1  due  to  the  high  lipid  concentration 
in  white  m  atter  were  no  ticed.  Intensity  changes  were  due  to  inter -sample  and  patien  t  to  patien  t 
variances  of  the  same  tissue  type.  In  addition,  cluster-averaged  IR  spectra  of  a  brain  metastasis  of 
(renal  cell  carcinom  a,  breast  cance  r,  lung  cancer,  and  colorectal  cancer)  and  gray  m  atter  of 
normal  brain  tissue  were  compared  after  baseline  subtraction  and  then  normalization  with  respect 
to  the  am  ide  I  band.  S  ignificant  differences  in  the  band  p  ositions,  in  tensities  and  area  were 
observed  between  these  sam  pies  which  were  then  used  as  potential  candidates  to  differentiate 
normal  and  tumor  tissue  and  for  the  identification  of  the  primary  tumor.  Here,  authors  used  only 
eight  spectroscopic  features  for  LD  A  model.  They  were  able  to  classify  correctly  for  three  out  of 
three  normal  brain  tissue  and  16  out  of  17  brain  m  etastases  samples.  Hence,  though  univariate 
analyses  and  features  provide  useful  recognitio  n,  their  integration  into  a  multivariate  algorithm 
provides  for  automated  recognition  of  clinical  importance.  It  may  also  be  argued,  however,  that  it 
is  questionable  whether  the  sm  all  numbers  of  samples  e  mployed  represent  a  true  perform  ance 
condition  for  the  algorithm  or  are  sim  ply  reflective  of  bias  arising  from  th  e  clinical  setting  or 
sample  sources.  The  advent  of  faster  imaging  approaches  and  advanced  sampling  techniques  like 
TMAs  can  allow  for  larger  numbers  of  samples  to  be  analyzed  and  such  doubts  about  the  validity 
of  studies  be  put  to  rest. 

Similarly,  tissues  from  rat  Gliom  a  models  have  been  characterized  an  d  used  to  d  iscriminate 
healthy  from  turn  or  sections  using  pr  incipal  component  analysis  and  K-  means.48  Pseudo  color 
maps  reported  were  con  structed  on  8-means  clusters,  where  each  clus  ter  is  cons  isting  of  similar 
spectra.  The  lipids/protein  ratio  (1466/1452  cm  "'  )  was  found  to  be  decreased  and  the  band  at 
1740  cm'1  became  weak  and  aim  ost  vanished  as  com  pared  to  the  corresponding  b  ands  in  the 
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healthy  tissue.  In  addition  to  the  above  m  entioned  differences,  significa  nt  differences  between 
healthy  and  tumor  affected  tissue  were  observed  in  the  finger  print  region.  In  the  healthy  tissue,  a 
weak  band  at  1 172  c  m'1,  representing  the  stretching  m  ode  of  C-0  groups  were  observed. 
Reduced  intensity  as  well  as  shifting  of  peak  to  1190  cm  1  was  noted  for  turn  or  and  surrounding 
tumor  spectra.  Turn  or  ti  ssue  was  observed  to  co  ntain  a  decreased  intens  ity  of  the  asymmetric 
phosphate  stretching  and  C-C  stre  tching  and  an  increased  intens  ity  of  the  symmetric  phosphate 
stretching  when  com  pared  to  th  e  healthy  tissue.  Variations  in  lipid  features  (m  ethylene  and 
methyl  stretching)  were  also  observed.  The  m  ajor  point  here  is  that  the  entire  spectrum  contains 
numerous  points  of  difference  betw  een  healthy  and  diseased  tissue.  Results  were  found  to  be  in 
agreement  with  those  obtained  from  pathology. 49  The  structural  difference  around  the  turn  or  was 
noted,  which  could  be  ascribed  to  the  peritumoral  aedoma  observed  during  glioma  development. 
An  increase  in  the  perm  eability  of  the  blood-brain  barrier  and  aggravation  in  the  m  ass  effect  of 
tumors  are  the  rationale  for  aedom  a,  which  is  associated  with  brain  turn  or.  Funda  mental 
understanding  can  be  enhanced  by  a  com  plete  understanding  of  the  spectral  differences  but 
prediction  algorithms  need  only  a  few  measures  of  the  spectral  data  to  be  effective. 


Breast 

Two  major  applications  in  breast  tissue  deal  with  complications  arising  from  artificial  alterations 
of  the  tissue  and  the  evolution  of  cancer.  W  hile  breast  augm  entation  by  im  plants  is  highly 
prevalent,  its  com  plications  have  been  di  scussed  m  ore  recently.  On  the  o  ther  hand,  the 
conventional  m  ethod  for  diagnosing  and  evaluati  ng  the  prediction  of  breast  disease  is  a 
histopathological  exam  ination  of  biopsy  sam  pies,  a  practice  that  has  som  e  shortcom  ings.  For 
breast  implants,  a  major  question  is  the  containment  of  filling  material  as  its  leakage  can  lead  to 
potential  diseases.  The  silicone  gel  in  im  plants  is  very  different  chem  ically  from  s  urrounding 
tissue  and  its  presence  in  tissue  sections  indicates  a  definite  leak  from  the  implant  either  due  to 
material  failure  as  a  consequenc  e  of  aging.  A  spectroscopic  im  age50  generated  from  the 
asymmetric  stretching  modes  of  the  m  ethyl  groups  attached  to  silicon  in  the  gel  allowed  for  the 
examination  of  silicone  in  the  tissu  e.  Due  to  th  e  unique  ch  emical  contrast  employed  in  FTIR 
imaging,  such  presence  can  be  discerned  within  th  e  tissue,  even  when  optical  m  icroscopy 
contrast  was  poor.  An  e  xample  of  presence  of  Dacron  (a  comm  ercial  name  for  polyethylene 
terepthalate))  f  ixative  p  atch  th  reads  in  the  brea  st  tis  sues  was  shown.  50  It  was  noted  that  th  e 
technique  is  capable  of  rapid  analysis  within  minutes  of  sectioning  the  tissue. 

A  few  reports  have  also  applied  F  TIR  im  aging  for  diagnosing  breast  diseases.  B  reast  turn  or 
tissues  were  characterized  by  both  FTIR  I  maging  and  point  m  apping  techniques  and  advantages 
over  the  other  were  evaluated.51  Similar  comparisons  had  previously  been  reported  for  polymeric 
materials,  analyzing  both  static  and  dyna  mic  sam  pies.  Com  parison  im  ages  f  rom  the  two 
methods,  imaging  data  provided  a  clearer  structure  in  the  tumor  area  than  the  data  obtained  from 
point  mapping.  Since  breast  turn  or  cells  are  ~10  pm  in  diam  eter,  point  m  apping  data  (with  an 
aperture  of  30  pm  )  would  always  contain  s  the  spectrum  of  turn  or  cells  as  well  as  from  the 
contributions  of  other  com  ponents  surrounding  th  e  cells.  The  study  clearly  indicated  that  the 
conventional  point  mapping  approach  can  fail  to  detect  a  small  number  of  malignant  cells  due  to 
its  poor  resolution  capabilities.  N  evertheless,  th  e  contam  ination  problem  ,  i.e.,  the  spectral 
contributions  of  other  com  ponents  surrounding  th  e  cell  is  found  to  be  less  severe  in  case  of 
ductal  carcinoma  in  situ  (DCIS).  The  study  illustrates  the  need  for  matching  the  appropriate  level 
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of  spatia  1  r  esolution  to  the  task.  W  hile  the  30  pm  resolution  m  ay  be  appropriate  for  som  e 
applications,  it  was  clearly  insufficient  for  detecting  smaller  numbers  of  cells. 

Artificial  network  and  K-means  cluster  analysis  have  also  been  employed  for  the  classification  of 
FTIR  imaging  data  from  nor  mal  and  malignant  immortalized  human  breast  c  ell  lines.53  Normal 
cells,  carcinoma  cells,  mixed  normal  and  carcinoma  cells  were  used.  Dif  ferences  in  the  spe  ctral 
backgrounds  between  the  training  and  test  data  were  observed,  which  confounds  the 
reproducibility  of  recorded  spect  ra  and,  thus,  causes  the  classifi  er  to  fail.  Using  rejection 
thresholds  in  the  application  of  the  ANN  classifi  er  was  reported  to  be  helpful  in  identifying 
doubtful  classifications.  Another  study54  reported  imaging  fibroadenoma,  a  benign  breast  turn  or. 
Data  were  evaluated  using  unsupe  rvised  cluster  analysis  by  ut  ilizing  two  spec  tral  r  egions, 
namely  1000-1500  and  2800-3000  cm  _1.  The  distribution  of  four  m  ain  tissue  com  ponents- 
epithelium,  retro  nuclear  basal  epithelial  regions,  mantle  zone  and  distant  connective  tissue  were 
visualized.  The  spectral  features  from  each  co  mponent  were  d  iscussed  in  d  etail.  Furtherm  ore, 
comparing  epithelia  from  fibroaedenom  a  and  DC  IS,  t  he  authors  determ  ined  that  subtle 
distinctions  between  the  IR  characteristics  of  these  two  are  reproducib  le.  The  initial  study  used 
tissue  from  a  single  patient. 

The  work  was  recen  tly  extended55  to  diagnose  benign  and  m  alignant  lesions  from  22  patients. 
The  study  utilized  only  spectra  from  well-defined  turn  or  areas  ow  ing  to  the  heterogeneity  of 
tissues.  Based  on  the  cluster  analysis  and  on  comp  arison  with  the  H  &  E  i  mages,  four  classes  of 
distinct  b  reast  tissue  s  pectra  were  identified  -  fibroadenom  a  (FA),  ductal  carcinom  a  in  situ 
(DCIS),  connective  tissue  and  adipose  tissue.  Fu  rther,  ANNs  were  developed  as  an  autom  ated 
classifier  to  differentiate  the  four  classes.  All  spectra  of  connective  tissue  and  adipose  tissue  were 
classified  correctly,  where  the  sp  ectral  features  are  clearly  different  fr  om  each  other  and  from 
tumors  as  well.  Differentiating  fibroadenoma  from  DCIS  was  more  difficult.  A  toplevel/sublevel 
strategy  was  further  applied  and  was  able  to  di  fferentiate  93%  between  fibroadenoma  and  DCIS 
spectra  by  employing  principal  component  analysis.  From  the  mean  spectra,  it  was  found  that  the 
DCIS  has  more  lip  id  content  than  th  e  fibroadenoma.  Invasive  ductal  carcinom  a  (IDC)  could  not 
be  well  characterized  due  to  contamination  from  surrounding  cells,  illustrating  the  limited  spatial 
resolution. 

Cervical  Cancer 

The  cervix  is  the  lower  part  of  the  uterus  (worn  b)  in  which  two  m  ajor  types  of  cancers  occur: 
squamous  c  ell  carcinom  a  and  adenocarcinoma.  About  80%  to  90%  of  cervical  cancers  are 
squamous  cell  carcinomas,  and  the  remaining  10%  to  20%  are  adenocarcinomas.  Less  commonly, 
cervical  cancers  have  features  of  bo  th  squamous  cell  carcin  omas  and  adenocarcino  mas.  These 
are  called  adenosquam  ous  carcinomas  or  m  ixed  carcinomas.  Typically,  the  Papanicolaou  (Pap) 
test  checks  for  changes  in  the  ex  foliated  ce  11s  of  cervi  x  to  find  the  presence  of  any  infection, 
abnormal  (unhealthy)  cervical  cells,  or  cervical  cancer.  FTIR  spectroscopy,  m  icro  spectroscopy 
and  FTIR  im  aging  have  been  widely  utilized  to  study  cervical  cancer  and  to  perform  the  sa  me 
function  using  computer  analyses  of  spectra. 26’ 56-60  While  the  first  reports  in  diagnosing  cervical 
cancer  are  now  generally  not  regarded  as  leading  to  solutions,  56  two  groups  have  provided 
definitive  proof  of  the  potential  of  IR  spectroscopy  by  careful  m  icroscopy  studies. 26, 57 ’  45, 59, 60 
While  FTIR  images  of  the  amide  I  and  oasy  PO2"  bands  with  H&E  stained  im  age  were  compared 
and  only  a  rough  correlation  with  the  pathological  features  or  ce  11  types  were  obtained,  cluster 
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maps  of  two,  five  and  eight  clusters  resultin  g  from  UHC  analysis  for  the  whole  spectrum 
demonstrated  good  segm  entation.  In  five  cluste  rs,  m  ost  cell  types  ar  e  apparent  including 
superficial  (1),  interm  ediate  (2),  parabasal  (3),  and  connective  tissu  e  (5)  upon  correlation  with 
the  stained  im  age.  As  in  univariate  im  ages,  the  connective  tissue  region  (5  )  is  split  in  to  two 
clusters.  Furthermore,  by  comparing  between  the  UHC  analysis  of  the  whole  spectrum  and  only 
the  amide  I  region,  authors  dem  onstrated  that  m  inimizing  the  spectral  regi  on  for  analysis  and 
using  fewer  clusters  does  not  le  ad  to  the  loss  of  us  efiil  information.  Both  univariate  FTIR  and 
multivariate  images  of  the  sam  pie  with  sev  eral  endocervical  ducts  within  th  e  connective  tissue 
were  shown.  These  endocervical  ducts  lined  with  columnar  endocervical  cells  were  apparen  t  in 
all  those  images,  in  particular  even  with  two  clusters. 

Cultures  derived  from  cervical  cancer  cells  (HeL  a)  are  one  of  the  m  ost  popular  model  systems 
and  have  been  studied  using  FTIR  im  aging.61  The  cells  were  directly  grow  n  as  sparse 
monolayers  onto  low-e  slides.  FTIR  im  age  of  amide  I  band  region  was  shown;  where  large 
differences  in  spectral  intensities  associated  with  the  cells  were  observed  even  though  these  cells 
are  from  a  horn  ogeneous  and  exponential  cell  cultu  re.  Cluster  analyses  of  nor  malized  spectra 
shows  distinct  differences  that  were  not  ap  preciated  in  the  univariate  im  age.  Sim  ilarly,62  IR 
imaging  with  fuzzy  C-  m  eans  clustering  and  hierar  chical  cluster  analysis  were  utilized  to  study 
the  th  in  sections  of  cervix  u  teri  encom  passing  norm  al,  precancero  us  and  squ  amous  cell 
carcinoma.  These  studies  dem  onstrate  th  at  IR  i  maging,  in  com  bination  with  multivariate 
techniques,  is  capab  le  of  segmenting  cervical  ti  ssues  in  a  m  anner  that  is  com  parable  to  H&E 
stained  im  age  dif  ferentiation  and  is  signif  icantly  m  ore  sensitiv  e  in  term  s  of  the  chem  ical 
composition  of  the  cells  -  whether  it  be  due  to  metabolic  or  disease  reasons. 


Prostate 

Prostate  cancer  is  the  most  prevalent  internal  cancer  in  the  US. 63  Hence,  its  pathologic  diagnosis 
and  correct  interpretation  of  disease  state  is  crucial.  64  FTIR  im  aging  has  been  proposed  as 
solution  that  can  potentially  he  lp  pathologists  by  providing  an  objective  and  reproducible 
assessment  of  disease  in  a  manner  that  is  easily  understood  by  clinicians.  It  is  also  a  good  model 
system  for  the  development  of  FTIR  imaging  protocols.  We  first  review  progress  in  the  field  and 
then  describe  efforts  in  our  and  collaborator’  s  laboratories  towards  for  mulating  a  practical 
algorithm  for  prostate  cancer  pathology.  W  hile  a  number  of  studies  exam  ined  human  prostate 
tissue  with  IR  spectroscopy  '  microscopy  approaches  have  recently  been  extensively  utilized 
to  study  both  fundam  ental  propert  ies  of  prostate  tissu  e  and  to  determ  ine  structural  units  in 
normal  and  disease  states.69'75  An  understanding  of  the  tissue  is  now  emerging  as  a  result  of  these 
studies.  While  the  fundamental  properties  of  the  tissue  are  being  examined,  we  have  focused  on 
developed  statistically  validated  diagnostic  methods. 

We  have  utilized  high  throughout  im  aging  with  the  express  purpose  of  correlatin  g  spectra  to 
clinical  practice.39, 64,76  It  is  instructive  to  first  examine  the  approaches  of  some  previous  studies 
and  then  describe  our  approach  in  som  e  detail.  A  variety  of  techniques  have  been  reported  for 
analyzing  prostate  tissue,  including  unsupervis  ed  multivariate  data  analysis  te  chniques  such  as 
agglomerative  hierarchical  clustering  (AH),  fuzzy  C-means  (FCM),  or  k-means  (KM)  clustering 
to  cons  tract  infrared  sp  ectral  m  aps  of  tissue  structures.  77  The  resu  Its  f  rom  these  multivar  iate 
techniques  confirmed  the  standard  histopatho logical  techniques  and  found  out  to  be  helpful  for 
identifying  and  discrim  inating  the  tissues  struct  ures.  Agglomerative  hierarchical  clustering  was 
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found  to  be  the  best  method  am  ong  the  cluster  imaging  m  ethods  in  term  s  of  segm  enting  the 
tissue.  While  these  techniques  comprise  one  en  d  of  the  approach  in  using  large  spectral  regions 
and  completely  objective  methods,  the  other  extreme  has  also  proven  to  be  useful.  In  the  second 
paradigm,  careful  examination  of  the  spectral  data  yields  some  measures  that  prove  useful.  For 
example,  the  ratio  of  peak  areas  at  1030  and  1080  cm  corresponding  to  the  glycogen  and 
phosphate  vibrations  respectively  w  ere  utilized  as  a  diagnostic  m  arker  for  the  differentiation  of 
benign  from  malignant  cells.69  Authors  summarized  that  the  use  of  this  ratio  in  association  with 
FTIR  spectral  im  aging  provides  a  basis  for  estim  ating  areas  of  malignant  tissue  within  defined 
regions  of  a  specimen.  While  it  may  be  argued  that  the  former  is  not  based  on  clinical  knowledge 
and  is  m  ore  suited  for  discovery,  it  also  involve  s  the  choice  of  selecting  specific  num  ber  of 
clusters  and  their  subsequent  interpretation.  The  latt  er  is  based  on  a  single  parameter  whose 
utility  for  universal  diagnoses  remains  to  be  tested.  Nevertheless,  these  studies  indicate  that  both 
approaches  provide  information  about  the  tissue  that  is  useful. 

Our  approach  has  used  elem  ents  from  both  patt  em  recogn  ition  and  sp  ectroscopic  analyses  of 
univariate  measures. 39,76  In  all  cases,  one  starts  with  the  acquired  imaging  data  (figure  4).  Since 
the  data  set  is  large  (typically  10-1000  GB),  it  is  advisable  to  reduce  the  dimensionality  of  data 
using  som  e  num  erical  procedure.  Com  pression  al  gorithms,  principal  com  ponents  analyses  or 
simply  storing  only  the  information  needed  for  classification  (if  the  algorithm  is  known)  is  useful. 
We  sought  expressly  to  relate  th  e  recorded  IR  im  aging  data  to  clinical  knowledge  base.  Hence 
we  started  with  a  m  odel  that  is  d  erived  from  clinical  practice.  Clearly,  the  appro  ach  limits  the 
discovery  of  new  knowledge  but  it  assures  the  clinician  that  al  1  quantities  of  im  portance  for 
diagnoses  will  be  considered.  The  acquired  d  ata  is  labeled  with  known  cell  identity  or  diseas  e 
states.  These  pixels  are  best  identified  by  a  combination  of  very  careful  manual  labeling  and  test 
for  absorbance  fidelity.  Spectra  from  the  label  regions  are  em  ployed  via  average  values, 
medians  and  standard  deviation  analyses  to  determ  ine  a  set  of  spectral  features  that  are 
descriptive  of  the  major  features  of  all  spectra.  We  first  note  that  the  characteristic  IR  absorbance 
spectra  of  ten  histological  classes  com  prising  prostate  tissue  look  sim  ilar.  Though  sm  all 
differences  in  spectral  featur  es  were  observed  at  m  any  freque  ncies,  summary  statis  tics  ar  e 
limited  in  their  exam  ination  of  spectra  for  classification.  Furthe  r,  the  sm  all  differences  indicate 
that  noise  and  biological  variability  m  ay  rende  r  univariate  m  easures  less  reliab  le.  The  larg  e 
number  of  classes  usually  im  plies  that  univariate  analyses  cannot  distinguish  all  histological 
classes  present  in  the  tissues  and  h  ence  the  need  for  multivariate  analyses  is  apparent.  Here  the 
similarity  of  the  spectral  features  for  all  classes  works  in  our  favor.  Very  similar  baseline  points 
are  obtained  from  an  analysis  of  all  spectra  a  nd  only  subtle  feature  di  fferences  are  noted  to 
distinguish  the  various  class  spectra.  Hence,  unknown  spectra  can  be  processed  in  the  sam  e 
specified  manner,  without  introducing  any  bias.  Each  of  these  f  eatures  is  term  ed  a  m  etric  to 
denote  that  it  is  a  useful  m  easure  of  the  spectrum.  Individual  metrics  can  allow  segmentation  of 
various  tissue  types  if  they  are  sufficiently  different  in  a  sampled  population. 

We  then  employ  the  equivalent  of  a  t-test  in  tha  t  th  e  overlap  between  th  e  absorban  ce 

distributions  of  metrics  is  dete  rmined  and  equated  to  the  error  in  prediction.  The  m  etrics  are 
arranged  in  the  order  of  increasi  ng  overlap.  Hence,  we  have  an  or  dered  set  that  differentiates  at 
least  two  classes.  To  o  btain  ov  erall  accu  racy,  we  em  ploy  a  m  odified  Bayesian  a  lgorithm  to 
provide  the  probability  of  each  class  for  every  pixel.  This  fuzzy  result  is  employed  to  determine 
the  area  un  der  the  curv  e  (AUC)  of  a  receiver  ope  rating  characteristic  (ROC)  curve.  The  ROC 
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curve  is  built  from  accepting  the  probability  of  each  class  at  an  increasing  threshold  that  varies 
between  0  and  1 .  For  optim  ized  threshold  values  ,  the  f  uzzy  classif  ication  is  tu  med  in  to  a 
classified  im  age,  where  each  pixel  is  assigned  a  distinct  class.  W  e  note  that  the  m  ethod 
incorporates  analysis  of  all  spec  tral  features,  a  selection  of  the  best  features  based  on  statistical 
analysis  of  data  and  an  optim  al  prediction  of  the  class  of  each  pixel  based  on  an  objective 
selection  rule  from  the  fuzzy  classification.  Th  e  m  ethod  is  very  powerful  in  that  it  em  ploys 
spectral  features  th  at  are  ordi  narily  em  ployed  by  spectroscopists  as  m  etrics,  which  perm  its  a 
spectroscopic  analysis  of  the  basis  of  decision-making.  Further,  the  method  explicitly  obtains  the 
fuzzy  rule  data  for  final  classification.  The  value  of  the  rule  data  for  each  class  is  actually  the 
probability  of  belonging  to  the  clas  s  without  consideration  for  the  prior  pr  evalence  of  the  class  . 
Hence,  the  method  can  allow  direct  comparisons  between  performances  for  different  classes.  The 
dependence  of  the  process  on  various  experimental  parameters  has  also  been  reported. 

The  complication  inherent  in  trans  lating  the  re  suits  from  small  data  se  t  of  patients  to  clinica  1 
applications  is  well  recognized  in  the  spectrosc  opy  community.  The  variability  in  data,  arising 
from  variations  within  and  between  patients,  sample  preparation  an  d  handling,  is  likely  to 
provide  noisy  estim  ates  of  perfor  mance.  Hen  ce,  statistical  stab  ility  m  ay  be  obtained  by 
examining  a  large  num  ber  of  samples.  Similarly,  large  number  of  patients  m  ay  be  e  mployed  to 
provide  calibration  models,  likely  improving  the  robustness  of  the  developed  algorithm.  We  have 
described  a  high  throughput  sa  mpling  method  from  tissues. 14, 39, 76  Briefly,  the  approach  uses  a 
combinatorial  sampling  of  tissue  type  and  pathol  ogy  to  first  acquire  sm  all  sections  of  tissues 
from  large  archival  cases.  These  sm  all  sections  are  arranged  in  a  g  rid  pattern  and  placed  on  the 
same  substrate.  The  sample  is  termed  a  tissu  e  microarray  to  ref  lect  the  s  imilarity  with  cDNA 
microarrays.  For  spectroscopic  im  aging  and  th  e  developm  ent  of  autom  ated  algorithm  s,  the 
approach  represents  a  large  num  ber  of  cases  that  can  b  e  used  both  f  or  accu  rate  prediction 
algorithm  building  and  for  extensive  validations.  The  same  approach  is  likely  to  prove  useful  for 
extensions  to  determining  pathology.  Figure  5  demonstrates  the  typical  workflow  of  a  validation 
algorithm  and  methods  used  for  statistical  comparison.  We  strongly  suggest  a  variety  of  methods 
for  m  easuring  performance  as  each  m  ethod  ha  s  its  own  advantages  and  disadvantages.  For 
example,  summary  measures  from  ROC  curves  only  provide  information  about  accuracy  but  do 
not  provide  which  class  the  inaccuracies  arise  from.  Similarly,  confusion  matrices  provide  cross¬ 
class  information  but  do  not  provide  global  performance  measures  in  the  mold  of  ROC  curves. 

OUTLOOK 

FTIR  im  aging  has  experienced  rapid  growth  in  the  past  10  years  and  is  increasingly  being 
applied  to  biomedical  tissue,  especially  for  the  analyses  of  cancer.  The  major  trends  emerging  in 
instrumentation  include  faster  detectors  and  nove  1  modes  of  data  collection  (e.g.  tim  e  -resolved 
imaging),  of  sampling  (e.g.  ATR)  and  application  areas.  For  biomedical  samples,  the  information 
content  is  quite  rich  and  is  often  available  through  simple  univariate  analysis.  For  more  complex 
applications,  e.g.  cancer  diagnoses,  the  data  ac  quisition,  sampling  and  data  analyses  m  ust  be 
integrated  in  a  coherent  manner  to  provide  a  practical  solution.  We  anticipate  that  the  technology 
and  its  application  to  biom  edical  problem  wi  11  continue  to  grow  with  the  cooperation  of 
instrument  m  anufacturers,  appl  ications  scientists,  num  erical  methods  developers  and 
communities  that  can  utilize  the  information  effectively,  e.g.  pathologists  or  surgeons. 
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SBFP  Javelin  (Rolling  Mode) 

-  64  x  64  bump-bonded:  180  Hz  (1996) 

-  64  x  64  :  250  Hz  to  315  Hz  (1996-98) 

-  64  x  64  :  430  Hz  (2000) 

-  64  x  64  :  615  Hz,  Triggered  Mode  (2001) 

-  Step-scan  (1997);  rapid-scan  (1999)  imaging 


Perkin-Elmer  Spotlight 

-16x1  “linear”  array  (2001) 

-  ~  1 00  spectra/s 

-  Exceptionally  high  SNR 

-  Thermo:  16x2  array  (2005) 


NIH256-SBFP  (Snapshot  Mode) 

-  256  x  256  MBE  grown  (1997)  -  NIST 
-256x256  :  143  Hz  Capable 

-  Rapid-scan  (2000)  imaging 

-  TRS  :  10  ms  (2002) 

-  TRS  :  0.1  ms  resolution  (2004) 


Digilab-SBFP  Lancer  (Snapshot) 

-  64  x  64  MBE:  3774  Hz  (2002) 

-  Step-scan  imaging  (2002) 

-  Digilab  “fast”  scan  ~  10  s  acquisition 


FBI-NIH128-RSC  (Snapshot) 

-  128  x  128  MBE:  >16  KHz  (2003) 

-  On-chip  co-addition 

-  Advanced  software 

-  Spatial  Subset 
-Trigger 

-  Rapid  Scan  Imaging  (2005) 

-  Potential 

-  Rapid  scan  :  0.06  s  acquisition 

-  TRS  :  40  microsecond 

-  Step-scan  :  High  SNR 


-  NIH/Akron  rapid  scan  :  0.25  s  acquisition 


Figure  1 .  Various  MCT  FPA  detectors  employed  for  FTIR  imaging  since  the  first  reports  using  Santa 
Barbara  Focalplane  (SBFP)  array  detectors.  The  years  in  parentheses  are  the  first  reports  of  use  for 
FTIR  imaging.  Perkin-Elmer  introdcued  the  concept  of  utilizing  a  small  linear  array  for  very  high  signal  to 
noise  ratios,  an  approach  that  has  since  been  adopted  by  Thermo.  Our  research  efforts  have  involved 
the  use  of  a  high  end,  custom-built  detector  that  allows  for  fast  imaging. 
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Figure  2.  FTIR  spectroscopy  and  imaging  permits  examination  of  molecular  conformation  changes  preceding  and  during  polymer 
crystallization.  (A)  The  distribution  of  crystalline  and  amorphous  fractions  as  a  function  of  time  for  undercooling  PEO  ~  13°  C 
below  its  melting  point  can  be  observed  by  the  intensity  of  any  peak  that  is  different  (B).  The  pixels  crystallizing  first  can  be 
analyzed  prior  to  crystal  formation  for  pre-ordering  transitions.  (C)  Different  regions  of  the  sample  have  different  kinetics 
(symbols),  which  are  not  apparent  in  the  average  spectral  change  (line).  (D)  The  kinetic  data  (noisy)can  be  fit  with  a  smooth  curve 
and  the  rate  of  crystallization  obtained.  (E)  spatial  variation  of  crystallization  rate  (E,  left)  correlates  with  the  onset  of  crystallization 
(E,  right).  Those  regions  that  start  to  crystallize  late  also  have  a  lower  rate  and  lower  ultimate  purity,  likely  due  to  diffusion  of 
impurities. 
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Figure  3.  Time-resolved  FTIR  imaging  can  provide  spatially-resolved,  millisecond  level  dynamics  over  large  sample  areas. 
The  operation  (A)  of  the  interferometer  is  similar  to  that  of  conventional  step  scan  spectroscopy,  except  that  an  entire  image 
is  acquired  for  every  sampling  point.  (B)  Various  functional  groups  may  be  monitored  in  time  at  specific  pixels  or  (C)  the 
entire  image  may  be  visualized.  (D)  Entire  spectra  from  pixels  may  also  be  observed  in  the  manner  of  conventional  time 
resolved  spectroscopy. 


Acquired  Data 


Manual  Selection 


Potential  Metric  Set 


Optimization 


Optimal  Metric  Set 


Calibration 

Statistics 


* 


& 


Gold  Standard 


Prostate  Histology 


8  I  s 

cn  ■  w> 


Model 


Algorithm 

(Modified  Bayesian) 


Optimized  Prediction 
Algorithm 


Figure  4.  Organization  of  data  into  a  prediction  algorithm  involves  several  steps.  Acquired  FTIR  imaging  data  (top,  left)  is  reduced  by 
manual  selection  to  a  set  of  features  that  capture  the  essential  elements  of  spectra  from  all  tissue  types.  A  model  (top,  right)  is  selected 
for  the  data  and  employed  to  develop  an  algorithm.  The  algorithm  is  applied  to  the  entire  metric  set  and  prediction  capabilities  are 
optimized.  Results  of  the  optimization  provide  an  optimal  metric  set  for  validation  studies,  the  parameters  of  the  algorithm  to  be  applied 
and  calibration  classification  statistics.  The  optimized  algorithm  is  applied  to  acquired  data  without  supervision  (figure  5). 


Ground  Truth 
Class 

Result  of 

Classification 

EPITHELIUM 

£ 

it 

■m 

a 

tfi 

H 

| 

s 

> 

n 

OJ 

5 

c 

w> 

-i 

s 

s 

7* 

SMOOTH  MUSCLE 

EPITHELIUM 

0.16 

021 

0.00 

MIXED  STROMA 

ODfl 

53.51 

079 

2-76 

FIBROUS  STROMA 

Q.i  a 

1-1B 

9304 

OBS 

SMOOTH  MUSCLE 

o.oo 

5,53 

0,54 

Confusion  Matrices 


Validation/Statistics 


m 

n 

Classified  Images 

Figure  5.  Validation  or  unsupervised  application  of  the  developed  protocol.  FTIR  imaging  data  are  acquired  (top,  left),  reduced  to  the 
optimal  metric  set  (obtained  as  in  figure  4),  which  is  then  converted  to  a  single  image  that  denotes  each  cell  type  by  a  specific  color  and 
empty  space  by  black  (top,  right).  Classified  images  can  be  compared  to  ground  truth  images  by  using  confusion  matrices,  ROC  curves 
and  by  comparisons  of  pixels  between  images.  Statistical  measures  from  these  validation  tests  provide  quantifiable  results  and  high 
confidence  in  the  development  of  robust  algorithms. 
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ABSTRACT 

Fourier  transform  infrared  (FT-IR)  spectroscopic  imaging  is  an  emerging  technique  that  combines  the  molecular 
selectivity  of  spectroscopy  with  the  spatial  specificity  of  optical  microscopy.  We  demonstrate  a  new  concept  in  obtaining 
high  fidelity  data  using  commercial  array  detectors  coupled  to  a  microscope  and  Michelson  interferometer.  Next,  we 
apply  the  developed  technique  to  rapidly  provide  automated  histopathologic  information  for  breast  cancer.  Traditionally, 
disease  diagnoses  are  based  on  optical  examinations  of  stained  tissue  and  involve  a  skilled  recognition  of  morphological 
patterns  of  specific  cell  types  (histopathology).  Consequently,  histopathologic  determinations  are  a  time  consuming, 
subjective  process  with  innate  intra-  and  inter-operator  variability.  Utilizing  endogenous  molecular  contrast  inherent  in 
vibrational  spectra,  specially  designed  tissue  microarrays  and  pattern  recognition  of  specific  biochemical  features,  we 
report  an  integrated  algorithm  for  automated  classifications.  The  developed  protocol  is  objective,  statistically  significant 
and,  being  compatible  with  current  tissue  processing  procedures,  holds  potential  for  routine  clinical  diagnoses.  We  first 
demonstrate  that  the  classification  of  tissue  type  (histology)  can  be  accomplished  in  a  manner  that  is  robust  and  rigorous. 
Since  data  quality  and  classifier  performance  are  linked,  we  quantify  the  relationship  through  our  analysis  model.  Last, 
we  demonstrate  the  application  of  the  minimum  noise  fraction  (MNF)  transform  to  improve  tissue  segmentation. 

Keywords:  Breast  Cancer,  FT-IR  Spectroscopy,  Hyperspectral,  Histopathology,  Imaging,  Diagnostics,  MNF  Transform 


1.  INTRODUCTION 

As  histologic  analysis  of  biopsied  tissue  forms  the  standard  in  definitive  diagnosis  of  breast  lesions,  it  is  estimated  that 
more  than  1.6  million  women  undergo  breast  biopsies  each  year  in  the  US  alone.  Biopsy  samples  are  fixed  to  ensure 
tissue  stability1  and  then  sectioned  for  staining.2  Microscopic  examinations  of  stained  tissue  sections  by  a  trained 
pathologist  are  the  gold  standard  used  in  diagnosing  breast  cancer.3  Unfortunately,  these  evaluations  are  time  consuming4 
and  do  not  always  lead  to  an  unequivocal  diagnosis.  For  example,  a  study  of  481  breast  cancer  patients  from  1982-2000 
at  a  regional  cancer  center  indicated  that  73%  of  ductal  carcinoma  in  situ  (DCIS)  patients  are  referred  by  a  general 
pathologist  to  an  expert  pathologist  for  review.5  After  review,  43%  of  these  cases  received  different  treatment 
recommendations.  Another  study  found  that  52%  of  cases  referred  to  a  multidisciplinary  tumor  review  board  received 
different  surgery  recommendations.6  Clearly,  the  diagnostic  process  is  sub-optimal.  Rapid,  objective  second  opinions  are 
desirable.  The  use  of  emerging  biological  understanding  and  technologies  for  diagnoses  could  provide  additional 
information  in  tumor  evaluation  and  help  make  accurate  therapy  decisions.  Further,  it  is  likely  that  the  morphologic 
parameters  of  current  diagnoses  are  insufficient  and  additional  information  must  be  added.  This  information  is  typically 
biochemical  in  nature.  For  example,  staining  for  human  epidermal  growth  factor  receptor  2  (HER2)  can  identify  25-30% 
of  breast  cancers.7  Such  examples  of  success,  unfortunately,  are  uncommon  for  cancers  in  complex  tissues.  Hence, 
alternative  methods  are  urgently  required  to  aid  diagnostic  pathology. 

One  such  means  is  the  use  of  molecular  spectroscopy.  For  example,  Fourier  transform  infrared  (FT-IR)  spectroscopy  is 
traditionally  used  for  molecular  identifications  and  biomolecular  structure  elucidations,  but  is  not  currently  applied  in 
clinical  pathology.8  An  IR  spectrum  provides  a  unique  molecular  fingerprint  with  a  quantitative  measure  of  the 
molecular  bonds  present  in  an  examined  material.9  Thus  it  should  give  a  reproducible  measurement  of  tissue 
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composition.  Tissue,  however,  is  microscopically  heterogeneous  and  the  measurement  of  chemical  composition  must  be 
made  in  the  context  of  knowledge  of  tissue  structure  (histology).10  The  recent  emergence  of  FT-IR  imaging  couples 
spectroscopy  and  microscopy  to  permit  rapid  acquisition  of  spectra  from  tens  of  thousands  of  pixels  at  a  high  spatial 
resolution.  Each  pixel  (spectrum)  typically  contains  thousands  of  data  points  in  the  mid-IR  wavelength  region  (2- 
12pm).11  Automated  classification  can  then  be  employed  for  rapid  computerized  tissue  image  analysis,  as  has  been 
practiced  in  both  the  spectral  processing  and  image  processing  communities.  The  end  goal  of  the  measurement  and 
associated  data  processing  steps  is  to  permit  the  rapid  segmentation  of  different  types  of  tissue  without  the  need  for 
chemical  dyes  or  contrast  agents.10  Last,  the  use  of  FT-IR  imaging  only  involves  light  interacting  with  a  sample  and, 
unlike  conventional  biochemical  analysis  methods,  does  not  alter  the  tissue  in  any  manner.  Thus  it  can  provide  additional 
information  for  pathology  without  the  necessity  of  additional  materials,  tissue  samples  or  changes  in  clinical  protocols. 

In  this  manuscript  we  use  breast  tissue  as  an  example  to  illustrate  the  application  of  FT-IR  imaging  coupled  with 
computerized  classification  for  histopathology.  Specifically,  we  demonstrate  that  a  combination  of  FT-IR  imaging, 
classification  algorithms  and  integrated  computational  methods  for  enhancement  of  acquired  data  can  be  used  in  tandem 
to  optimize  the  development  of  practical  protocols  for  automated  histopathology.  Previous  studies  report  on  the  potential 
for  IR  spectroscopy  in  breast  pathology,12’13’14’15’1617  but  no  complete  study  on  the  spectral  features  of  different  histologic 
types  of  breast  tissue  exists.  Preliminary  efforts  indicate  significant  spectral  variation  between  different  types  of  breast 
tissue  and  breast  tumors,18’19’20  but  a  protocol  for  clinical  translation  is  lacking.  We  combine  fast  FT-IR  imaging  and 
tissue  microarray  sampling  to  demonstrate  the  effectiveness  of  our  approach  for  automated  breast  histopathology  on 
normal  and  malignant  tissue  from  five  patients.  This  approach  is  distinct  from  that  in  Raman  spectroscopy,  where 
histologic  models  are  used  in  analyzing  spectra.21,22  As  a  first  step  towards  automated  tissue  segmentation,  we 
distinguish  breast  stroma  and  epithelium.  This  is  a  critical  step,  as  over  99%  of  breast  tumors  arise  in  the  epithelial  tissue 
lining  milk  ducts  and  lobules.23  False  color  classified  images  denoting  stroma  and  epithelium  are  produced,  followed  by 
analysis  of  data  collection  parameters.  We  evaluate  the  impact  of  spectral  resolution  and  noise  on  classification  accuracy 
to  demonstrate  potential  for  faster  data  acquisition  without  loss  in  classification  confidence.  This  study  presents  an  initial 
effort  in  developing  applications  for  FT-IR  imaging  in  clinical  pathology. 


2.  METHODOLOGY 


2.1  Data  Acquisition 

The  first  studies  to  examine  IR  spectra  of  tissue  began  over  fifty  years  ago,24  but  the  field  did  not  truly  make  progress 
due  to  limitations  in  instrumentation.  Today,  a  combination  of  an  IR  microscope,  Michelson  interferometer  and  focal 
plane  array  (FPA)  detector25  permits  efficient  data  acquisition  for  large  sample  areas.  The  data  presented  in  this  study  is 
collected  using  the  Perkin-Elmer  Spotlight  400  imaging  spectrometer.  A  spatial  pixel  size  of  6.25  pm  and  a  spectral 
resolution  of  4  cm"1  were  employed,  with  2  scans  averaged  for  each  pixel.  An  IR  background  is  collected  with  120  scans 
co-added  at  a  location  on  the  substrate  where  no  tissue  is  present.  No  undersampling  was  employing  in  data  acquisition 
and  a  NB  medium  apodization  function  was  used.  A  ratio  of  the  background  to  tissue  spectra  is  then  computed  to  remove 
substrate  and  air  contributions  to  the  spectral  data.  The  Spotlight  software  atmospheric  correction  algorithm  is  applied  to 
eliminate  remaining  atmospheric  contributions  to  the  tissue  spectra.  As  opposed  to  other  configurations  that  employ  a 
large  FPA  detector,  this  instrument  employs  a  linear  array  detector  that  is  raster  scanned  to  acquire  data  from  large 
sample  areas.  We  use  a  combination  of  instrument  control  and  post-processing  software  to  computationally  re-organize 
data  acquired  into  large  image  sizes.  Images  of  stained  tissue  are  acquired  using  a  standard  Zeiss  optical  microscope. 

2.2  Tissue  sampling 

Tissue  microarrays  (TMAs)  permit  facile  comparison  of  small  tissue  samples  from  numerous  patients26  and  are  an 
especially  useful  sampling  medium  for  spectroscopic  analyses.27  A  TMA  contains  numerous  small  round  tissue  samples, 
termed  cores,  which  are  extracted  from  biopsy  samples  from  different  patients.  Two  paraffin-embedded  TMAs  were 
obtained  from  a  commercial  source  (US  Biomax)  for  this  study.  The  first  TMA  section  is  placed  on  a  glass  slide  and 
stained  with  hematoxylin  and  eosin  (H&E)  dyes.  In  H&E  staining,  hematoxylin  stains  nucleic  acids  and  eosin  stains 
protein-rich  tissue  regions.  This  section  is  used  for  visual  morphology  interpretation  by  a  pathologist.  The  second  TMA 
section  is  placed  on  a  barium  fluoride  (BaF2)  substrate  for  FT-IR  imaging.  Though  the  arrays  contained  a  large  number 
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of  samples,  a  smaller  subset  of  malignant  and  normal  tissue  cores  from  five  patients  with  invasive  ductal  carcinoma 
(IDC)  is  selected  for  this  study  as  the  illustrative  example.  Each  of  the  ten  cores  is  1.5  mm  in  diameter;  hence,  at  a  6.25 
pm  pixel  size,  approximately  280,000  spectra  are  collected  for  each  core.  This  results  in  the  collection  of  over  560,000 
spectra  for  each  patient  and  approximately  2.8  million  total  spectra  for  all  ten  cores.  This  large  spectral  dataset  facilitates 
rigorous  validation  of  classification  protocols  at  a  pixel  level.  Paraffin  is  removed  from  the  TMA  by  immersion  in 
hexane  with  continuous  stirring  at  40  °C  for  48-72  hours.  Spectra  are  recorded  at  several  locations  on  the  TMA  every  24 
hours  during  this  period  to  monitor  paraffin  removal  with  the  disappearance  of  the  1462  cm"1  peak. 


2.3  Image  analysis  and  classification 


A  supervised  segmentation  method  is  used  for  FT-IR  image  classification.  This  algorithm  has  been  described  in  detail 
elsewhere,28  but  is  based  on  a  modified  version  of  a  Bayesian  classifier.  First,  the  spectral  profile  of  1641  bands  is 
reduced  to  a  set  of  89  useful  metrics  by  examination  of  spectra  from  manually  selected  stroma  and  epithelium  tissue 
regions.  Metrics  are  manually  selected  to  include  peak  ratios,  peak  areas,  and  peak  centers  of  gravity.  A  metric  profile  M 
is  generated  for  each  pixel  in  each  tissue  image  of  the  form 

M  =  [ml,m2,m3,...mn  ],nm= 89  (1) 


where  each  mt  is  the  value  for  a  single  metric  and  nm  is  the  total  number  of  manually  selected  metrics.  Frequency 
distributions  for  stroma  and  epithelium  are  determined  for  each  metric  and  used  to  estimate  the  probability  of  a  given 
metric  profile  representing  either  of  these  two  classes.  The  probability  of  an  image  pixel  from  each  class  ct  being 
represented  by  a  given  metric  profile  is  determined  using  Bayes’  Rule 


p{c\M) 


P(M\Cj)p(Ci) 
P(M ) 


(2) 


where  p(  M\cl )  is  estimated  from  the  metric  class  frequency  distributions  and  p(  M )  is  the  probability  of  a  given  metric 


profile.  The  prior  probability  of  particular  tissue  class  p(ct)  in  this  model  cannot  be  determined  due  the  manual 
selection  of  tissue  classes  on  FT-IR  images,  and  is  estimated  as  0.5.  Other  ways  to  estimate  or  optimize  the  class  prior 
probability  may  be  utilized;  we  have  noticed  anecdotally,  however,  that  the  choice  of  this  value  across  a  large  range  does 
not  significantly  affect  the  classification  results.  Classification  accuracy  is  estimated  with  receiver  operating 
characteristic  (ROC)  analysis  for  selected  tissue  regions.  The  area  under  the  ROC  curve  (AUC)  is  used  to  evaluate 
classifier  sensitivity  and  specificity  and  estimate  the  potential  of  the  algorithm  for  accurate  histology  determinations.  The 
classification  algorithm  is  trained  on  a  large  array  dataset  and  separately  validated  on  a  second  array.  It  is  notable  that  we 
do  not  develop  the  entire  classification  algorithm  anew  here.  First,  the  central  idea  of  this  manuscript  is  to  demonstrate 
the  optimization  of  a  developed  protocol  and  second,  the  sample  sizes  chosen  here  are  insufficient  for  de  novo  algorithm 
development.  Data  is  analyzed  using  the  Environment  for  Visualizing  Images  (ENVI)  software  and  with  programs 
written  in-house  using  Interactive  Data  Fanguage  (IDF). 


2.4  Spectral  resolution  and  noise  analysis 

Spectral  resolution  and  noise  are  two  common  experimental  variables  that  affect  results  in  IR  spectral  analyses.  The 
effects  of  spectral  resolution  and  spectral  noise  are  evaluated  here  in  the  context  of  quantitative  histologic  segmentation 
to  minimize  data  collection  time.  As  per  the  trading  rules  of  IR  spectroscopy,  data  collection  time  is  expected  to  decrease 
linearly  with  spectral  resolution  and  a  quadratic  rate  with  reduction  in  signal-to-noise  ratio  (SNR).29  Ideally,  these 
parameters  would  be  analyzed  by  acquiring  data  at  different  spectral  resolutions  and  numbers  of  spectral  co-adds. 
However,  the  time  required  to  collect  multiple  images  for  the  TMA  is  prohibitive.  Instead,  computational  methods  are 
used  to  examine  these  parameters  using  the  original  FT-IR  images  acquired  at  4  cm'1  and  2  scans  per  pixel.  First,  spectral 
resolution  is  evaluated  by  downsampling  the  data  using  a  neighbor  binning  procedure  to  resolutions  of  8,  16,  32,  64  and 
128  cm"1.  Classification  is  then  performed  on  downsampled  datasets  to  determine  the  coarsest  spectral  resolution  needed 
for  satisfactory  stroma  and  epithelium  segmentation.  For  a  fine  spectral  resolution  data  set  at  4  cm"1,  the  effect  of  noise  is 
evaluated  by  adding  to  each  spectrum  noise  in  Gaussian  distributions  with  standard  deviations  of  0.001,  0.01,  and  0.1  au. 
Classification  accuracy  is  estimated  by  evaluating  the  AUC  at  each  noise  standard  deviation.  Computational  noise 
reduction  with  the  minimum  noise  fraction  (MNF)  transform30  is  evaluated  by  reducing  noise  in  all  the  data  sets. 
Classification  is  performed  with  the  same  algorithm  on  these  MNF  transformed  images  to  determine  the  impact  of  this 
noise  reduction  algorithm  on  stroma  and  epithelium  segmentation. 
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3.  DATA 


The  classification  model  presented  in  this  manuscript  involves  segmentation  of  stroma  and  epithelium,  which  are  the  two 
most  prominent  tissue  classes  in  fixed  breast  tissue  used  for  pathology  evaluation.31  In  practice,  the  recognition  of 
epithelial  cells  is  especially  critical  for  cancer  diagnoses,  as  the  vast  majority  (>99%)  of  breast  cancers  arise  in  this  cell 
type.23  Hence,  the  two  class  model  is  of  practical  significance.  While  seemingly  simple  and  practical,  however,  the 
model  can  potentially  be  confounding  as  stroma  consists  of  many  cell  types  with  disparate  spectral  characteristics.  This 
model  was  employed  to  develop  a  classifier  using  training  data  from  a  TMA  with  forty  patients.  Final  model  calibration 
for  sixty  eight  tissue  cores  yielded  an  AUC  value  of  0.99  with  an  eight  metric  classifier.32,33  In  this  study  we  validate  this 
classifier  with  one  malignant  and  a  matched  normal  TMA  core  from  a  subset  of  five  patients.  As  seen  in  Figure  1A  and 
B,  absorbance  images  based  on  spectral  features  closely  compare  with  images  of  H&E  stained  tissue.  Hence,  using 
conventional  pathology  knowledge  we  can  select  image  pixels  that  unequivocally  correspond  between  the  two  images  - 
representing  both  stroma  and  epithelium.  These  pixels  are  selected  by  examining  FT-IR  images  at  1080  cm"1  to  highlight 
asymmetric  P02  stretching  vibrations  in  glycoprotein  in  epithelium,14  1236  cm'1  to  highlight  CH2  wagging  vibrations 
associated  with  collagen  proteins,34  1652  cm"1  to  highlight  C=0  stretching  vibrations  at  the  protein  amide  I  mode,34  and 
3292  cm"1  to  highlight  NH  bending  vibrations  at  the  protein  amide  A  mode  (shown  as  an  example  in  Figure  IB).35  We 
emphasize  that  multiple  vibrational  modes  must  be  examined  in  tandem  and  pixels  identified  with  great  care  and 
diligence  as  these  form  the  gold  standard  for  future  comparisons.  Over  185,000  pixels  are  marked  in  these  ten  tissue 
cores  to  serve  as  the  gold  standard  for  ROC  analysis  (as  shown  in  Figure  1C).  Selecting  this  large  set  of  pixels  is 
important  to  achieve  a  reasonable  sample  size  to  accurately  estimate  classification  potential  for  the  entire  data  set. 
Boundary  pixels  are  not  marked  to  avoid  errors  associated  with  mixed  pixels  in  FT-IR  images.27  A  qualitative 
comparison  of  stained  and  classified  images  indicates  that  stroma  and  epithelium  segmentation  is  reasonable  (Figure 
ID),  and  this  is  confirmed  with  an  AUC  value  of  0.98  after  quantitative  ROC  analysis.  Stroma  and  epithelium  are  easily 
identified  on  false  color  classified  images  without  detailed  examination  and  interpretation.  This  is  advantageous  over 
traditional  staining  methods  that  require  the  use  of  chemical  dyes  and  subsequent  expert  pathologist  examination  for 
evaluation. 


1mm 


epithelium 

stroma 


Fig.  1.  Conventional  H&E  stained  images,  FT-IR  spectral  images  and  classification.  (A)  An  H&E  stained  image  of  tissue 
cores  from  five  invasive  ductal  carcinoma  patients.  Each  row  represents  a  single  patient,  with  malignant  tissue  samples 
on  the  left  and  normal  samples  on  the  right.  (B)  An  FT-IR  image  at  3292  cm'1  denotes  the  NH  bending  vibration  at  the 
amide  A  protein  mode.  Brighter  regions  denote  relatively  protein-rich  stroma.  (C)  A  ground  truth  FITR  image  with 
pixels  marked  as  stroma  or  epithelium  serves  as  the  gold  standard  for  ROC  analysis  and  classification  evaluation.  (D)  A 
classified  FT-IR  image  in  which  all  pixels  are  labeled  as  stroma  or  epithelium  accurately  corresponds  to  the  H&E 
stained  image.  The  classification  does  not  require  stains  or  human  interpretation. 
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4.  RESULTS 


4.1  Effect  of  spectral  resolution  on  tissue  segmentation 

The  impact  of  spectral  resolution  on  classification  performance  is  evaluated  by  downsampling  spectra  at  every  pixel  with 
a  neighbor  binning  and  interpolation  procedure.  FT-IR  image  data  sets  are  acquired  at  4  cm"1  spectral  resolution  and  are 
downsampled  to  8,  16,  32,  64,  and  128  cm'1  resolution.  As  seen  in  Figure  2A,  an  average  spectrum  at  each  resolution 
from  epithelial  cells  in  the  gold  standard  demonstrates  that  important  spectral  elements  remain  identifiable  at  coarser 
resolutions.  While  we  anticipate  that  the  area  under  the  peaks  would  be  preserved,  peak  shapes  begin  to  change  at  a 
courser  spectral  resolution  of  32  or  64  cm"1  due  to  overlaps  in  the  complicated  spectral  response.  It  would  not  be 
surprising  to  note  that  the  most  robust  predictors  of  class  incorporate  best  both  biological  diversity  and  spectral  noise 
(arising  from  both  measurement  and  artifacts).  Hence,  we  anticipate  that  the  use  of  these  metrics  would  also  prove  robust 
when  spectra  are  downsampled.  Figure  2B  demonstrates  that  the  classification  accuracy  is  not  significantly  affected  until 
the  spectral  resolution  is  decreased  to  128  cm"1. 

The  result  is  indeed  surprising  as  numerous  prior  biomedical  studies  with  vibrational  spectroscopy  have  employed  4  cm"1 
to  16  cm"1  spectral  resolution.  There  are  two  important  differences  between  the  problem  here  and  a  majority  of  those 
studies.  First,  many  of  the  reported  studies  used  sensitive  spectral  analysis  tools  (e.g.  second  derivatives)  or  were  looking 
for  fine  spectral  features.  Second,  models  for  pathology  may  have  needed  more  complex  information.  Here,  we  are 
examining  a  2  class  problem  of  very  distinct  cell  types.  Hence,  the  acceptable  classification  at  very  coarse  resolutions  is 
likely  permitted  by  the  significant  biochemical  differences  between  stroma  and  epithelium  in  the  metrics  selected. 
Previous  studies  have  provided  evidence  of  clear  differences  in  IR  spectra  from  DNA-rich  tissues  such  as  epithelium  and 
RNA  and  protein-rich  tissues  such  as  stroma,14,20  especially  in  the  IR  fingerprint  region  from  500-1500  cm"1.8  We 
hypothesize  that  a  more  complex  model  with  additional  tissue  classes  would  likely  require  a  higher  spectral  resolution 
for  reasonable  classification,  but  that  this  resolution  is  not  required  to  distinguish  stroma  and  epithelium. 

A  powerful  feature  of  the  algorithm  we  employ  is  the  utilization  of  prominent  spectral  features  for  classification.  Here, 
the  features  selected  as  classification  metrics  are  not  very  sensitive  to  changes  in  spectral  resolution.36  Absorbance  values 
are  accurate  if  the  peak  full  width  at  half  maximum  (FWHM)  is  not  significantly  less  than  the  spectral  resolution.  As 
biological  materials  have  broad  and  overlapping  lineshapes,  the  condition  holds  even  for  very  coarse  resolutions. 
Therefore,  the  values  of  spectral  metrics  are  not  significantly  altered  even  if  some  details  in  the  spectrum  are  affected  at 
coarser  spectral  resolutions.  The  center  of  gravity  metrics  used  for  classification  are  particularly  robust,  as  they 
incorporate  peak  position  and  shape  and  are  not  strongly  influenced  by  peak  modifications  in  downsampled  spectra.  Care 
must  be  exercised  in  making  this  extrapolation  to  all  data  quality.  For  example,  for  poor  signal  to  noise  ratio  spectra,  the 
center  of  gravity  calculation  will  be  sensitive  to  noise. 


A  B 


Fig.  2.  Spectral  resolution  effect  on  classification.  (A)  Epithelial  spectra  obtained  by  downsampling  data  acquired  at  4  cm'1 
indicate  that  IR  spectrum  quality  degrades  appreciably  at  a  spectral  resolution  coarser  that  16  cm'1,  as  anticipated  for 
condensed  phase  biological  materials.  (B)  AUC  analysis  for  stroma  and  epithelium  segmentation  for  each  resolution 
demonstrates  a  significant  decrease  in  classification  accuracy  only  at  a  very  course  spectral  resolution  beyond  64  cm'1. 
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The  effective  classification  in  downsampled  FT-IR  images  presented  in  this  manuscript  indicates  potential  for  faster  data 
acquisition  without  significant  loss  in  classification  accuracy.  Figure  2  suggests  that  no  significant  classification 
differences  are  observed  in  images  up  to  64  cm"1.  Since  data  acquisition  time  is  estimated  to  decreased  linearly  with 
spectral  resolution,29  FT-IR  images  could  be  acquired  16  times  as  fast  without  any  loss  in  classification  performance  for 
the  two  class  model  presented  in  this  manuscript.  Again,  we  emphasize  that  the  results  are  preliminary  and  should  be 
carefully  validated.  Nevertheless,  the  idea  of  optimizing  data  acquisition  by  modeling  the  results  of  other  experimental 
conditions  is  an  important  one  that  should  be  pursued  in  practical  translation  of  these  protocols  for  clinical  use. 


4.2  Effect  of  spectral  noise  on  tissue  segmentation 

Evaluation  of  acceptable  spectral  noise  for  FT-IR  image  classification  is  important  for  efficient  data  collection.  For 
practical  applications,  it  is  advantageous  to  acquire  data  with  the  lowest  SNR  that  permits  reasonable  classification.  Raw 
data  is  acquired  with  a  peak-to-peak  noise  value  of  0.011  au,  a  root  mean  square  (rms)  noise  value  of  0.008  au,  and  an 
average  amide  I  height  of  0.328  au.  To  assess  the  impact  of  spectral  noise  on  classification  accuracy,  Gaussian  noise  is 
added  with  a  standard  deviation  of  0.001,  0.01,  and  0.1  au.  Figure  3  provides  a  qualitative  evaluation  of  histologic 
images  from  the  acquired  data  set  (Figure  3A)  and  from  the  data  sets  with  added  Gaussian  noise  (Figures  3B-D). 

These  images  indicate  that  acceptable  classification  is  achieved  when  noise  is  added  at  a  standard  deviation  of  0.001  au 
(Figure  3B),  but  that  classification  accuracy  appreciably  decreases  with  the  addition  of  noise  at  or  above  a  standard 
deviation  of  0.01  au.  This  is  expected,  since  adding  noise  at  a  standard  deviation  of  0.001  au  does  not  significantly 
change  the  FT-IR  image  data  SNR.  The  data  set  with  noise  added  at  a  standard  deviation  of  0.01  au  (Figure  3C)  produces 
a  classified  image  with  regions  of  distinguishable  stroma  and  epithelium,  although  there  are  numerous  stray  pixels  that 
are  not  correctly  classified,  similar  to  salt  and  pepper  noise.  Upon  the  addition  of  noise  of  -0.1  au,  classified  images 
become  completely  indistinguishable  (Figure  3D),  including  the  misidentification  of  many  pixels  on  the  empty  region  of 
the  slides  as  tissue.  This  loss  in  classification  accuracy  is  caused  by  an  underlying  broadening  of  spectral  metric 
distributions  for  each  class.  This  broadening  bridges  the  difference  in  metric  values.  The  overlap  in  values  in  turn 
decreases  classification  confidence  as  measured  by  the  AUC.  Hence,  we  have  used  the  AUC  as  a  reasonable  measure  of 
the  classification  accuracy  at  every  experimental  condition. 

A  plot  of  AUC  against  the  added  noise  (Figure  3E)  demonstrates  that  the  AUC  value  remains  relatively  constant  with  the 
addition  of  low  levels  of  noise.  It  then  decreases  to  a  mean  AUC  of  0.77  with  the  addition  of  noise  at  a  standard 
deviation  of  0.01  au  and  falls  to  a  mean  AUC  of  ~0.5  at  a  noise  standard  deviation  of  0.1  au.  It  is  surprising  that  the 
stroma  AUC  actually  falls  below  0.5.  Though  the  AUC  values  should  not  be  below  0.5  for  classified  images,  our 
algorithm  contains  a  pixel  rejection  step.  A  pixel  is  rejected  if  the  measured  metric  values  do  not  lie  within  the  prior 
probability  distributions.  Hence,  a  small  number  of  pixels  are  rejected  at  low  noise  levels  and  are  not  accounted. 
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Fig.  3.  Effect  of  noise  on  FT-IR  image  classification.  Classified  images  are  shown  for  (A)  raw  data,  (B)  data  with  Gaussian 
noise  added  at  a  standard  deviation  of  0.001  au,  (C)  data  with  Gaussian  noise  added  at  a  standard  deviation  of  0.01  au, 
and  (D)  data  with  Gaussian  noise  added  at  a  standard  deviation  of  0.1  au.  (E)  The  AUC  values  for  classification  with 
noise  added  at  a  standard  deviation  of  0.001,  0.01,  and  0.1  au  confirm  that  classification  accuracy  is  reasonable  with  a 
small  amount  of  additional  noise  but  unsatisfactory  in  data  with  a  noise  standard  deviation  at  or  above  0.01  au. 
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For  the  two  class  stroma  and  epithelium  segmentation  model  presented  in  this  manuscript,  an  AUC  value  of  0.77  does 
not  indicate  sufficient  classification  confidence.  We  would  expect  nearly  perfect  discrimination  of  theses  two  types  of 
tissue  since  there  are  numerous  spectral  features  that  distinguish  epithelium  and  stroma.14,20,32,34  An  estimated 
classification  accuracy  of  0.5  for  this  model  is  equivalent  to  random  guessing  and  does  not  provide  any  information 
about  tissue  histology.  Examination  of  the  curve  in  Figure  3E  indicates  that  some  additional  spectral  noise  at  a  level  of 
0.001  can  be  present  without  loss  in  classification  accuracy  for  this  two  class  model.  We  did  not  observe  any  difference 
in  this  behavior  with  pathology  of  the  tissue.  Breast  tumor  tissue  is  often  very  heterogeneous  and  precise  pixel 
classification  is  needed  to  produce  reasonable  automated  classification  results.  Hence  these  results  represent  a  good 
starting  point  to  optimize  a  practical  protocol.  There  may  also  be  a  patient  or  clinical  setting  dependence  of  these  optimal 
operating  points  that  remains  to  be  probed.  From  the  plot,  it  is  likely  that  we  are  close  to  the  operating  point  of  a  practical 
protocol,  as  addition  of  a  small  amount  of  noise  (>0.01  au)  makes  the  classification  unstable. 

Last,  the  classification  algorithm  was  optimized  using  a  noise  level  similar  to  that  of  the  acquired  data  set  presented  in 
this  manuscript.  Hence,  the  optimal  metric  sets  and  discriminant  function  are  obtained  for  that  noise  level.  It  would  not 
prove  surprising  if  a  de  novo  training  and  optimization  of  lower  quality  data  could  yield  similar  results.  A  de  novo 
classification  algorithm  development,  however,  is  not  guaranteed  to  produce  equivalent  results  for  the  higher  noise  cases 
and  will  fail  where  overlap  between  the  prior  distributions  is  significant  due  to  noise  broadening.  Hence,  we  believe  that 
the  conditions  found  here  are  close  to  optimal. 

4.3  Noise  reduction  with  the  MNF  transform 

In  this  manuscript,  we  have  used  an  instrument  with  a  high  performance  detector  that  has  a  low  multichannel  detection 
advantage.  FT-IR  imaging  using  large  focal  plane  array  (FPA)  detectors,  however,  is  a  promising  avenue  for  rapid  data 
acquisitions  due  to  the  large  multichannel  advantage.  Imaging  with  FPAs,  unfortunately,  often  results  in  low  signal-to- 
noise  (SNR)  data  due  to  the  poor  detector  characteristics  and  other  limitations.37  From  the  trading  rules  of  FT-IR 
spectroscopy,29  achieving  a  factor  of  n  improvement  in  SNR  would  result  in  a  increase  of  n2  in  data  collection  time.  An 
alternative  to  improve  SNR  is  to  employ  post-processing  algorithms  to  reduce  noise.  One  such  avenue  for  noise 
reduction  is  the  use  of  the  minimum  noise  fraction  (MNF)  transform.  The  MNF  transform  can  be  used  in  a  mathematical 
procedure  to  remove  uncorrelated  contributions  from  the  spatial  and  spectral  domains.  First,  a  forward  transform  is  used 
to  perform  a  factor  analysis  and  re-order  spectral  data  in  the  order  of  decreasing  SNR.  The  MNF  calculation  is  a  two-step 
process.  A  noise  covariance  matrix  is  estimated  and  used  to  decorrelate  and  rescale  the  noise  in  the  data.  Subsequently,  a 
standard  PCA  performed  on  the  noise-whitened  data.  A  second  step  is  to  select  only  those  factors  that  correspond  to  a 
sufficiently  high  SNR  by  examining  the  eigenvalue  images.  The  first  few  eigenvalue  images  generally  correspond  to 
higher  SNR  values  and  contain  most  of  the  useful  information.  Noise  reduction  is  achieved  by  suppressing  the  later 
factors  corresponding  largely  to  noise  or  zero-filling  components  and  inverse  transforming  the  data.  A  noise  reduction  by 
a  factor  greater  than  5  could  be  achieved  by  this  technique  if  the  initial  SNR  is  sufficiently  high.38,39  Though  the  utility  of 
this  method  is  demonstrated  for  IR  imaging,40  its  use  has  not  been  widespread.  Further,  the  use  of  MNF  transformed  data 
for  tissue  classification  has  not  been  attempted. 

We  propose  to  use  the  MNF  transform  route  as  a  method  for  fast  data  acquisition  without  loss  in  classification  accuracy. 
The  protocol  involves  rapid  data  collection  at  a  low  SNR,  followed  by  application  of  MNF  transform  for  noise  reduction. 
Classification  is  then  performed  on  these  noise-reduced  images.  It  must  be  noted  that  the  gain  here  is  through 
computational  techniques  and  does  not  involve  changes  in  instrumentation  hardware  or  data  acquisition  time.  A 
secondary  advantage  that  may  arise  is  that  decreasing  the  variance  in  spectral  data  could  also  decrease  the  biologic 
variance  in  the  data  and  should  improve  separation  of  tissue  classes.  Excessive  image  noise  will  broaden  spectral  metric 
distributions  for  each  class,  which  increases  the  error  associated  with  each  metric  and  decreases  classification 
confidence.  Therefore,  if  the  metric  distribution  mean  values  for  each  class  are  sufficiently  different  decreasing  noise 
will  decrease  the  area  of  metric  distribution  overlap  and  improve  segmentation  confidence. 

The  impact  of  noise  reduction  on  classification  is  demonstrated  in  Figure  4.  The  MNF  transform-based  protocol  is 
applied  to  the  acquired  data  set  and  the  data  sets  with  Gaussian  noise  added  as  discussed  in  the  previous  section. 
Classified  images  are  displayed  for  each  noise  level  after  MNF  transform-aided  noise  reduction  (Figures  4A-D).  The 
AUC  values  for  the  MNF  transformed  image  sets  are  compared  with  the  AUC  values  for  noisy  images  (Figure  4E). 


Proc.  of  SPIE  Vol.  6853  685306-7 


A 


— CL 


m  ® 

D 

mm 

•  m 

mm 

mm 

mm 

•• 

mm 

mm 

mm 

— o—  MNF  Transform 

IE-4  IE-3  0.01  0.1 

Noise  Added 


Fig.  4.  Improvement  in  automated  FT-IR  image  classification  with  the  application  of  the  MNF  transform.  Classified  images 
from  MNF  transformed  FT-IR  images  are  shown  for  (A)  raw  data,  (B)  data  with  Gaussian  noise  added  at  a  standard 
deviation  of  0.001  au,  (C)  data  with  Gaussian  noise  added  at  a  standard  deviation  of  0.01  au,  and  (D)  data  with 
Gaussian  noise  added  at  a  standard  deviation  of  0.1  au.  (E)  Comparing  AUC  values  for  original  FT-IR  images  and 
MNF  transformed  FT-IR  images  demonstrates  that  classification  improves  with  noise  reduction,  especially  when  the 
noise  has  a  standard  deviation  of  0.01  -  0.1  au. 


Evaluation  of  classified  images  and  AUC  values  indicates  that  the  MNF  transform  improves  classifier  performance  for 
each  image.  Given  that  the  classification  accuracy  was  very  high,  the  effects  of  MNF  transform  are  significant  only  when 
the  noise  level  degrades  the  original  data.  Nevertheless,  it  can  be  seen  from  the  figure  that  the  high  accuracy  is  recovered 
for  an  order  of  magnitude  increase  in  data  noise.  Therefore,  application  of  the  MNF  transform  on  data  acquired  with 
these  noise  distributions  will  make  a  significant  difference  in  classifier  performance.  Specifically,  we  can  acquire  data 
with  a  noise  standard  deviation  of  0.01  au  and  provide  accuracy  levels  that  are  comparable  to  those  obtained  in  our 
measurements  of  lower  noise.  This  finding  is  significant  in  that  noise  levels  of  0.01  au  are  commonly  obtained  in  rapidly 
acquired  FT-IR  imaging  data  sets  with  large  array  detectors.  Further,  since  the  classification  accuracy  seems  to  be  little 
affected  by  spectral  resolution,  we  can  anticipate  that  it  will  be  little  affected  by  the  choice  of  an  apodization  function 
and  other  minor  sources  of  error  for  a  reasonable  spectral  resolution.  Hence,  we  contend  that  the  protocol  developed  here 
would  be  well-suited  to  rapid  imaging  with  large  array  detectors. 


5.  CONCLUSIONS 

Recent  developments  in  FT-IR  imaging  and  data  processing  facilitate  new  applications  for  this  technology.  In  this 
manuscript,  we  report  an  initial  application  in  automating  histopathology  of  breast  tissue.  Supervised  segmentation  of 
breast  stroma  and  epithelium  in  FT-IR  images  is  presented  and  nearly-perfect  classification  accuracy  is  estimated.  The 
impacts  of  spectral  resolution  and  noise  on  image  classification  are  evaluated.  Results  in  this  paper  demonstrate  that 
spectral  resolution  can  be  decreased  1 6-fold  without  loss  in  classification  accuracy.  The  classification  algorithm  is  more 
sensitive  to  noise,  but  noise  reduction  with  the  MNF  transform  can  improve  classification  accuracy  while  decreasing  the 
time  required  for  data  collection.  This  evaluation  of  the  impact  of  experimental  parameters  on  classification  accuracy 
represents  a  first  step  in  developing  a  practical  protocol  for  rapid  and  automated  histopathology. 
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